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ABSTRACT 


A decentralized  many-person  decision  problem  is  one  where 
each  decision  maker  has  different  information.  If  one  decision  maker's 
information  depends  on  what  another  decision  maker  has  done,  the 
information  is  called  "dynamic.  " In  the  past,  problems  involving 
dynamic  information  have  been  very  difficult,  if  not  impossible,  to 
solve.  Two  specific  examples  which  have  been  solved,  one  from 
economic  theory  and  the  other  from  classical  information  theory,  will 
be  investigated.  It  will  be  shown  that  they  can  be  formulated  as  two- 
person  decision  problems  with  the  type  of  dynamic  information  structure 
called  "signaling.  " The  first  example  involves  a model  of  the  job 
market  as  a nonzero-sum  game.  New  equilibrium  solutions  are  found 
and  properties  of  these  solutions,  such  as  stability,  multiple  solutions, 
and  threshold  effects  of  signaling  cost  and  noise,  are  studied.  The 
second  example  models  the  Shannon  problem  as  a team  theory  problem. 
The  concept  of  real-time  information  theory  is  introduced,  where 
source  and  channel  sequences  are  of  a fixed  length,  and  general  results 
about  real-time  solutions  are  proved  and  demonstrated. 
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CHAPTER  I 


INTRODUCTION 

Information  available  to  a decision  maker  (DM)  not  only  influences 
his  actions,  but  also  determines  whether  a solution  to  the  decision 
problem  exists  at  all.  The  role  of  information  becomes  particularly 
complicated  when  there  is  more  than  one  DM,  especially  if  each  DM 
has  different  information.  In  order  to  study  how  information  influences 
decision  making  in  many-person  decision  problems,  two  specific 
examples,  one  from  economic  theory  and  one  from  classical  information 
theory,  will  be  examined.  It  will  be  shown  that  they  can  be  formulated 
as  two-person  decision  problems.  This  formulation  provides  a frame- 
work for  studying  each  problem's  information  structure,  that  is,  "who 
knows  what.  " The  reason  these  particular  examples  have  been  chosen 
is  because  they  both  exhibit  a special  type  of  information  structure 
called  "signaling.  " In  the  past  (see  [1],  [4],  and  [7]),  problems  in- 
volving this  type  of  information  structure  have  been  very  difficult,  if  not 
impossible,  to  solve.  However,  these  two  examples  can  be  solved  (in 
a sense  to  be  defined).  Thus,  they  provide  new  insights  into  possible 
new  solution  techniques. 

Before  going  on  to  the  two  problems  in  deUil,  we  first  will  define 
more  precisely  what  is  meant  by  a many-person  decision  problem  with 
"signaling.  " 

Suppose  there  are  N decision  makers,  with  the  i-th  DM  denoted 
as  DMi.  First  of  all,  let  x 6 f2  be  a random  variable  representing  the 
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state  of  the  world  (or  state  of  nature)  that  each  DM  would  like  to  know, 

with  probability  density  p(x).  Secondly,  for  i=  1,...,N,  DMi's 

information  z.  € Z.  is  a function  of  x,  written  as  z.  = h.(x).  When 
11  11 

h.  ^ hj  for  i / j,  then  we  say  the  problem  is  "decentralized,  " since 

each  DM  is  making  decisions  based  on  different  information.  Thirdly, 

DMi's  action,  or  decision,  u.  ^ U-  is  a function  of  his  information, 

expressed  as  u.  = y^(z.),  where  y.  is  called  a "strategy"  or  "decision 

rule.  " Lastly,  DMi's  objective,  or  payoff,  function  J.  is  the 

expectation  of  a function  of  x and  all  of  the  DMs'  strategies.  Each 

DMi  now  faces  the  following  problem:  choose  a strategy  y.  from  a 

class  of  specified  admissible  strategies  (usually  taken  to  be  the  class  of 

measurable  functions  from  Z.  to  U. ) to  minimize 

1 1 

'^i^Yl?  * • ’ > ~ • • • > 

The  information  to  DMi  can  be  modified  to  include  not  only  z.,  a 
a measurement  of  the  state  x,  but  also  the  actions  of  the  other  decision 
makers.  For  example,  suppose  DMi's  information  also  includes  Uj, 
the  action  of  DMj,  j ^ i.  Thus,  a sense  of  order  is  conveyed  in  that 
DMj  acts  before  DMi,  and  DMi  observes  this  action.  When  this 
happens,  that  is,  when  DMi's  information  depends  on  what  another 
person  has  done,  we  say  that  DMi  has  a dynamic  information  structure 
[2]>  [3].  Otherwise,  the  information  structure  is  called  static. 


This  is  not  the  most  general  definition  of  information,  but  is  sufficient 
for  our  purposes  at  this  time. 

We  are  considering  only  nonclassical  information  in  that  each  decision 
maker  has  different  information  [8]. 


i 


I 
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The  former  is  the  type  of  information  structure  that  occurs  in  the  two 

* 

examples  to  be  studied. 

Sometimes  a dynamic  information  structure  can  be  reduced  to  a 
static  one  [2].  However,  one  example  where  this  is  not  the  case  is  the 
type  of  dynamic  information  structure  we  call  "signaling.  " For  the 
case  of  N = 2,  let  DMl's  action  be  denoted  as  u and  DM2's  as  v. 

Then  signaling  is  defined  as  the  type  of  dynamic  information  where 
DMl's  information  is  just  x and  DM2's  information  is  just  u.  In 
other  words,  DMl  "signals"  his  knowledge  of  x to  DM2  through  his 
action  u = 'Yj(x).  DM2  must  now  infer  x from  u and  choose 
V = 

Chapter  II  examines  an  economic  application  of  signaling  based  on 
a model  of  the  job  market  by  Spence  [6].  Although  Spence  used  the 
term  "signaling"  to  describe  the  type  of  information  transfer  in  the 
job  market,  Chapter  II  extends  the  model  by  formulating  the  problem  as 
a two-person  decision  problem.  The  reason  for  this  is  twofold:  first 
of  all,  we  immediately  see  that  Spence's  model  is  an  example  of  a 
problem  with  (nonclassical)  dynamic  information  that  can  be  solved. 

For  this  reason,  it  provides  an  excellent  vehicle  for  studying  this  type 
of  information  structure.  Secondly,  this  set-up  allows  us  to  find  new 
solutions  and  investigate  different  properties  of  the  solutions.  Although 
the  lack  of  detail  in  the  model  prevents  us  from  asserting  the  absolute 
validity  of  the  economic  issues  raised,  the  decision-  and  control- 
theoretic  framework  provides  qualitative  insights  into  modeling  the 
transfer  of  information. 

*The  reason  problems  with  a dynamic  information  structure  are  difficult 
to  solve  is  because  the  underlying  probability  distributions  needed  to  find 
the  solution  are  themselves  solution-dependent.  See  [8]  and  [3]  for  details. 
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Chapter  in  deals  with  problems  in  Shannon  theory,  also  some- 
times referred  to  as  classical  information  theory,  which  addresses 
the  problem  of  coding  a message  and  sending  it  through  a noisy 
communication  channel  [5].  At  first  glance,  this  may  sound  unrelated 
to  the  economics-oriented  Spence  problem  of  Chapter  II,  but  we  will 
show  that  this  problem  also  can  be  modeled  as  a two -per son  decision 
problem  with  a signaling  information  structure,  only  now  the  DMs 
form  what  will  be  described  as  a "team.  " To  correspond  more 
accurately  to  the  Spence  problem,  the  formulation  will  be  modified  to 
introduce  the  concept  of  "real-time  information  theory.  " This  provides 
decision  and  control  theorists  with  an  understanding  of  information 
theory  in  their  own  terms.  On  the  other  hand,  it  provides  information 
theorists  with  an  entirely  new  way  of  looking  at  Shannon  theory. 
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CHAPTER  n 


MARKET  SIGNALING  AND  THE  SPENCE  MODEL 


Introduction 


In  the  job  market  model  of  Spence  [1],  [2],  an  employer  must  hire 
someone  for  a job  without  knowing  how  productive  that  individual  will  be. 
In  other  words,  the  employer  has  imperfect  information  about  an 
individual's  ability.  Spence  suggests  that  the  employer  can  improve  his 
information  by  looking  on  the  job  application  for  some  signal,  such  as 
educational  level.  The  employer  offers  wages  based  on  the  signal  he 
sees;  that  is,  a person  with  more  education  is  offered  higher  wages, 
because  the  employer  believes  that  the  higher  education  indicates  higher 
ability.  The  individual  applying  for  the  job,  on  the  other  hand,  knowing 
he  will  receive  wages  based  on  his  educational  level,  must  decide  how 
much  education  to  get,  taking  into  consideration  that  education  is  costly. 
When  the  employer's  beliefs  about  the  relationship  between  ability  and 
education  are  confirmed  by  what  the  individuals  actually  do,  then  we 
have  what  Spence  calls  an  equilibrium. 

An  interesting  feature  of  this  model  is  that  there  are  multiple 
equilibrium  solutions.  In  this  chapter,  we  explain  why  this  is  true  and 
prove  new  results  about  the  Spence  model.  In  order  to  do  this,  the 
model  is  formulated  as  a two-person  nonzero-sum  noncooperative 
decision  problem  with  imperfect  and  dynamic  information.  The  purpose 
of  this  is  to  clearly  display  the  decision  and  control  theoretic  nature  of 
the  problem,  in  particular  the  role  played  by  the  dynamic  information 
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structure.  Under  this  formulation,  new  classes  of  multiple  equilibria 
can  be  found  and  an  explicit  method  for  computing  these  new  equilibria 
is  given.  Also,  different  properties  of  the  solutions  are  investigated, 
such  as  stability  and  threshold  effects. 

2.  Problem  Statement 

t 

All  potential  employees  will  be  considered  together  as  decision 
maker  one  (DMl)  and  the  employer  as  decision  maker  two  (DM2).  DMl's 
information  is  natural  ability,  denoted  by  the  variable  x.  That  is,  each 
person  makes  a decision  based  on  knowledge  of  his  own  true  ability. 

This  can  be  expressed  as  a mapping  from  ability  to  educational  level, 
denoted  by  ■Yj(x)  = u,  where  u is  the  variable  representing  educational 
level.  (We  assume  x>  0 and  ua  0 to  rule  out  the  meaningless  notions 
of  "negative"  ability  and  "negative"  amounts  of  education.  ) The 
employer's  information  is  the  signal  u,  and  his  strategy  is  to  offer 
wages  as  a function  of  education,  denoted  where  v represents 

wages.  We  immediately  see  that  this  is  the  type  of  dynamic  information 
structure  defined  in  Chapter  I as  "signaling,  " where  educational  level 
u is  the  signal. 

In  Spence's  model,  signaling  costs  c(u, x)  and  productivity  s(u, x) 
are  functions  of  education  level  and  ability.  Each  individual  applying 
for  a job  chooses  the  educational  level  to  maximize  his  net  profit,  the 
difference  between  his  wages  and  costs.  For  DMl,  the  entire  employee 

*Thus,  educational  level  u not  only  serves  as  a signal  about  x,  but 
also  affects  productivity  directly  when  s(u,  x)  is  an  explicit  function 
of  u. 


2-3 


population,  the  goal  is  to  maximize  the  expected  net  profit,  with 
expectation  taken  over  the  variable  x.  We  assume  that  everyone, 
including  the  employer,  knows  the  distribution  of  ability  types  through 
out  the  population.  Thus,  the  payoff  function  Jj  for  the  individuals 
is  written 

Jl(Vl,'V2)  = E[\2('Vi(x))  - c(yj(x),x)]  . 

This  is  the  same  criterion  that  Spence  proposes  although  he  does  not 
consider  it  in  the  context  of  a two-person  decision  problem. 

Assuming  utility  units  are  appropriately  defined,  the  employer 
would  like  to  pay  people  no  more  than  what  they  are  worth;  that  is,  he 
wants  wages  v to  be  less  than  or  equal  to  productivity  s.  However, 
if  V is  strictly  less  than  s,  another  employer  could  come  along  and 
offer  wages  greater  than  v but  less  than  s,  attract  employees  away 
from  the  first  employer,  and  still  make  a profit.  We  will  combine  this 
idea  of  competition  with  the  original  proposal  that  wages  not  be  greater 
than  productivity  in  a single  loss  function  for  the  employer  by  penalizing 
any  deviation  from  s.  Hence,  the  employer  wants  to  choose  a wage 
schedule  ^2  minimize  the  quadratic  loss  function 

2 

J2<Yi,  ^2^  = E[\2(\i(x))  - s(Yj(x),x)] 

is  a mathematical  device  to  allow  us  to  (1)  reproduce  Spence's 
result  under  our  setup,  and  (2)  focus  on  the  equilibrium  under  compe- 
tition without  bringing  in  competition  explicitly,  thus  avoiding  the 
complication  of  a three-person  decision  problem.  In  Section  6 we  will. 
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however,  discuss  the  issue  of  competition  directly,  as  was  done  in 
[3]  and  [4]. 

The  problem  is  now  in  strategic  form;  i.e.  , the  goal  is  to  find 
the  •’optimal"  strategies  yj  and  where  by  optimality  we  mean 
finding  the  noncooperative  Nash  equilibrium,  sometimes  referred  to  as 
person-by-person  optimality.  This  is  defined  as  follows:  (y^,  y^)  is 

a Nash  equilibrium  pair  for  the  objective  functions  Jj  (maximize)  and 
{minimize)  if  and  only  if 

^ ^ admissible  y^ 

^ ^ admissible  y^ 

That  is,  neither  DM  has  the  incentive  to  unilaterally  deviate  from  the 
equilibrium  solution.  By  standard  manipulations  [5],  the  first  order 
necessary  conditions  for  the  Nash  equilibrium  are: 

max  [y2(u)  - c(u,x)]  - 

min  [(v  - s(u,x))^]  =»  y^lu)  = v = E/^(8) 

where  ' denotes  d/du,  and  denotes  E(*  |x).  * It  is  clear  that 

the  second  order  sufficient  conditions  for  the  second  equation  hold,  since 
J2  is  quadratic. 

That  is,  instead  of  solving  for  the  strategies  y,  and  y_  in  function 
space,  we  fix  the  arguments  x and  u and  solve  for  the  variables  u 
and  V,  respectively. 


I 
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The  difficulty  now  is  that  p(xlu),  the  underlying  probabiUty 

density  function  in  the  determination  of  solution-dependent; 

that  is,  it  cannot  be  evaluated  until  yj  is  specified.  A way  out  of  this 

predicament  is  to  guess  that  yj  is  a one-to-one  function.  Then 

knowledge  of  u implies  knowledge  of  x,  so  that  = E^Js(u,x)]  = 

s(u,x  = yj^(u)).  Spence  proves  in  [1]  and  [2]  that  the  second  order 

> conditions  for  the  first  equation,  namely,  y^'  _ ^ <0,  are  satisfied 

uu  ’ 

in  this  case  under  the  assumptions 

i)  c > 0 
u 

ii)  c <0 
ux 

iii)  s >0. 

X 

A particular  example  from  [2]  in  which  y2  = ® follows. 

EXAMPLE  2.1: 


Then,  y^  = l/x  and  o*” 


This  is  a differential  equation  in  y^  ^8  the  one-parameter  family 

of  solutions 

y2(u)  = ^2u  + 2k  (2* 

where  k is  the  parameter.  Since  y2  = 
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Since  x > 0,  then  Vj  in  (2.  2)  is,  in  fact,  one-to-one,  and  our  original 
assumption  is  verified.  Equations  (2.  1)  and  (2.  2)  are  the  equilibrium 
solutions  derived  in  [1]  and  [2],  so  that  the  two-person  model  recaptures 
Spence's  original  results. 

An  important  property  of  the  solutions  is  that  varying  k produces 
a continuum  of  multiple  equilibria.  Since 


dJ*(k) 


= E - > 0 


(where  J j(k)  = J j(yj(k),  Vj  and  as  in  (2.  1)  and  (2.  2)),  the 

equilibria  parameterized  by  larger  k give  larger  expected  net  profit 
to  DMl  than  those  with  smaller  k.  For  DM2,  varying  k does  not 
ma£;ter,  since  ^2  always  equals  x and  J2  remains  zero.  When  one 
equilibrium  solution  is  better  than  another  solution  for  at  least  one  DM 
without  harming  the  other  DM,  the  former  solution  is  called  "Pareto- 

superior"  to  the  latter.  Thus,  solutions  (2.2)  with  larger  k are 

Pareto -superior  to  those  with  smaller  k. 

Spence  also  works  out  an  example  where  x and  u are  discrete 
random  variables  [2]; 

EXAMPLE  2.2:  Let  x6{l,2j.  Let 

q = fraction  of  population  of  type  x = 1 

1-q  = fraction  of  population  of  type  x = 2 

c = u/x  8 = X 


Suppose  the  employer  guesses  a relationship  between  ability  and 
education  that  results  in  the  following  conditional  density  function  and 
wage  schedule: 
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Pr  {x=l  lO  s u < u } = 1 => 


= 1 for  u < u 


Pr{x=2|uiuJ=l 


= 


= 2 for  u i u 


■y-  is  a two-level  step  function  as  shown  in  Figure  2.  1.  Since  the  cost 
of  education  is  a monotonically  increasing  function  of  u for  fixed  x, 
the  net  profit  ■y,  - c will  be  maximized  only  at  the  education  levels 
u = 0 or  u . Thus,  the  individuals  will  choose  either  u = 0 or  u . 
For  X = 1, 


[\2  ■ 


= max 


{1,  2 - u } 


For  X = 2, 


max  " max  | ^ > 2 - ^ | 


Therefore,  in  order  to  have  consistency  with  the  employer's  beliefs, 


we  must  have: 


Yj(x  = 1)  = 0 <=>  1 > 2 - u , or 


* 1 
u > 1 


•yj(x  =2)=  u<=>  2--^>l,  or  u<2 


<=>  1 < u < 2 


(2.3) 


Inequality  (2.  3)  is  the  equilibrium  condition  for  this  discrete  example. 
Varying  the  parameter  u between  1 and  2 again  results  in  a 
continuum  of  multiple  equilibria.  Also, 


I 
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J*(u*)  = l*q+^2--^j(l-q) 

= 0 


As  u decreases,  Jj  increases.  Therefore,  solutions  with  smaller 

♦ ♦ 
u are  Pareto- superior  to  those  with  larger  u . 

Because  the  information  structure  is  dynamic,  the  employee  has 
complete  control  on  what  the  employer  can  infer  from  his  observation 
u about  the  underlying  state  of  nature  x.  Thus,  it  is  not  too  surprising 
that  by  allowing  the  employer  to  make  different  kinds  of  inference  on 
the  functional  relationship  between  x and  u,  different  functional 
forms  for  the  equilibria  can  be  obtained. 


3.  Comparison  of  Equilibria 

In  Example  2.  1,  yj  is  a one-tc-one  mapping  from  a continuous 

set  of  abilities  to  a continuous  set  of  educational  levels.  In  Example 

2.2,  "Vj  is  also  one-to-one,  but  this  time  the  sets  are  discrete.  In 

both  of  these  examples,  the  employer  can  precisely  determine  ability 

from  merely  looking  at  the  signal.  In  our  model,  we  will  obtain 

equilibria  somewhere  between  these;  our  equilibria  involve  a continuous 

range  of  abilities  but  a discrete,  finite  number  of  given  signals.  Thus, 

* 

our  mappings  from  ability  to  signals  are  many-to-one.  We  believe  our 
equilibria  are  intuitively  appealing  for  several  reasons.  Fi)  <t  of  all, 

^Our  equilibria  are  actually  many-to-one  solutions  for  Example  2.  1.  The 
available  range  of  signals  remains  continuous,  but  only  a finite  number 
of  signals  are  actually  chosen  by  the  employees. 


in  actuality,  there  are  only  a discrete  number  of  educational  levels  at 
which  wages  are  offered,  for  example,  bachelor,  master,  and  doctorate 
degrees.  Secondly,  many  different  types  of  people  choose  the  same 
signal,  suggesting  a many-to-one  mapping.  Lastly,  employers  are 
limited  in  the  amount  of  information  processing  they  can  do,  so  that 
they  can  handle  only  a discrete  number  of  signals. 

Another  property  Spence's  discrete  example  has  in  common  with 
his  continuous  one  is  that  there  are  multiple  equilibria,  in  fact,  a 
continuum  of  multiple  equilibria.  As  mentioned  earlier,  some  are 
Pareto-superior  to  others  in  the  sense  that  they  give  a higher  expected 
net  profit  Jj  to  DMl.  Spence  points  out  that  the  Pareto-inferior 
solutions  are  inefficient  in  the  sense  that  people  are  overinvesting  in 
the  signal  by  purchasing  more  education  that  is  necessary  to  signal 
their  ability  levels.  Spence  [3]  and  Riley  [4]  have  discussed  how  to 
choose  the  Pareto -optimal  solution,  if  possible,  that  is,  the  solution 
which  has  no  other  solutions  Pareto-superior  to  it,  in  order  to 
eliminate  or  reduce  the  inefficiencies.  However,  they  assume  that  the 
employer  has  the  power  to  manipulate  both  the  wage  schedule 

the  signal  levels  u by  changing  the  parameters  of  the  problem,  in  the 
first  example  by  varying  k,  and  in  the  second  by  varying  u . In  effect, 
this  is  equivalent  to  changing  the  signals  already  existing  in  the  market. 
In  our  equilibria,  we  assume  that  the  parameters,  and  hence  the 
signals,  are  fixed  exogenously.  This  reduces  the  multiple  equilibria 

Although  "pseudo"  educational  levels,  such  as  "master 's  degree  with 
two  years  experience,  " have  been  created  over  time,  they  are  still 
discrete. 
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to  a single  equilibrium,  in  general.  The  justification  for  this 
assumption  is  that  when  the  employer  comes  into  the  market,  the 
educational  levels  used  for  pay  scales  are  already  determined.  Only 
after  a long  period  of  time  can  new  levels  be  established.  Table  2.  1 
summarizes  the  differences  between  our  solutions  and  Spence  s. 

TABLE  2.  1 

Comparison  of  Equilibrium  Classes 

Ability  Signal  (e.  g.  Education) 

Spence  Discrete  Discrete-not  fixed 

Spence-Riley  Continuous  Continuous -not  fixed 

Ho-Kastner  Continuous  Discrete-fixed 


I 


f 


4.  New  Multiple  Equilibrium  Classes 

In  Example  2.  1,  we  began  by  guessing  yj  was  one-to-one, 
determined  from  this  y^,  and  then  found  that  the  resulting 

solution  was  consistent  with  the  original  guess.  In  the  second  example, 
we  guessed  y~  as  a function  of  a parameter  and  then  through  y^ 
determined  the  values  of  the  parameter  that  would  give  consistency. 
Thus,  as  mentioned  in  the  introduction,  an  equilibrium  can  be  described 
as  the  solution  to  an  implicit  equation  resulting  from  a mathematically 
self-consistent  loop,  as  shown  in  Figure  2.2,  where  p(x|u)  is  the 


L 


^ U = y,(x) 

FIG.  2.2  SELF-CONSISTENCY  LOOP  ILLUSTRATING 
IMPLICIT  EQUATION 


: 1 


I 
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conditional  probability  density  of  x given  the  signal  u,  which  is 
determined  by  To  find  the  new  equilibrium  classes,  we  guess  yj 

in  terms  of  some  parameters  and  determine  the  conditions  on  the 
parameters  so  that  the  resulting  y^  from  circling  the  loop  is  the 
original  guess  y^. 

Figures  2.  3 and  2.4  show  the  kind  of  equilibria  we  are  looking 
for.  We  assume  ability  x lies  in  some  fixed  range  [xq,Xj^].  Let 


* 1 > • • • » 


X 1 be  points  inside  the  interval  such  that 
N- 1 


*0  ^^1^’^2"‘ 


(2.4) 


Let  u € [uq,Uj^],  and  assume  that  ability  types  within  each  subinterval 

fx  X I ) choose  the  same  signal  u. . The  endpoints  x.  of  these  sub- 
'•  i’  i+1  1 

intervals,  except  for  x^  and  Xj^,  will  be  called  "breakpoints".  No 
single  person  chooses  the  breakpoints;  they  just  reflect  how  the  entire 
employee  population  divides  itself.  More  precisely,  DMl's  strategy 
is  as  follows  (see  Figure  2.  3): 


\i(x)  = 


u.  , X € [x.,x.^j)  V i = 0,...,N-2 

Uj^  j,  X e [xj^_j,Xj^] 


It  is  clear  from  Jj  that  we  must  have  Uq  < u^  < • • • < 

otherwise,  wages  (y^)  would  be  a decreasing  function  of  education 

level,  an  intuitively  unappealing  result.  Since  yj  assumes  discrete 


'"We  are  assuming  here  that  y^  is  a monotonically  increasing  function 
of  educational  level. 
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values  over  a finite  number  of  intervals,  the  assumptions  for  this 
problem  will  be  a discrete  version  of  the  first  two  assumptions  above, 
that  is,  i')  Ac/au>0  and  ii')  A c/axAu<0.  The  third  assumption 
will  not  be  needed  to  prove  the  sufficient  conditions  for  this  class  of 
solutions. 

The  employer  can  now  compute  his  strategy,  after  he  computes 
the  conditional  density  function  (as  shown  in  Figure  2.  2): 


V i = 0,  . . . , N-1  , 


- F(x.)  » * ^ K»*i+1^  1 - N - 1) 


p(x(u.)  = 


otherwise 


(p  is  the  probability  density  function  and  F is  the  distribution  function) 
and 


= ^/u.  = — 

' 1 


s(u^,x)p(x)  dx 


F(x.^l)  - F(x.) 


A 

= V. 


'1'  1' 


(2.5) 


The  variables  v^  represent  the  actual  wage  values,  and  the  functions 

g.  show  the  dependence  of  wages  on  Xj^  and  So  far, 

defined  only  for  the  discrete  signals  u^^,  i = 0, . . . , N-1.  In  order  to 

have  our  equilibria  be  many-to-one  solutions  for  Example  2.  1, 

mu»t  be  a Nash  solution  in  the  strategy  space  of  measurable  mappings 


! 
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defined  over  the  entire  interval  We  arbitrarily  define 

completely  as 


V. 


“ ^ ^ = 0, . . . , N-2 


y2(“)  = 


Thus,  the  employer's  strategy  also  looks  like  a step  function  (Figure 
2.4).  As  will  now  be  shown,  this  particular  form  for  gives  us  the 
results  we  need  for  a simple  equilibrium  condition. 

Given  the  wage  schedule  DMl  can  continue  around  the  loop 

and  compute  a new  strategy  y^: 


yj(x)  = arg  max  [y2(“)  - c(u,  x)] 


As  shown  in  Figure  2.4,  assumption  i')  that  c >0  implies  that  DMl 


will  only  consider  choosing  among  Uq,  . . . , Uj^  Therefore, 


y,(x)  = arg 


max 


u.€{uo,...,Un_i] 


[g.(x.,x.^j)  - c(u.,x)] 


In  order  to  attain  self-consistency  and  have  want,  for  all 


i = 0,,..,N-1  (omitting  the  arguments  of  g^  for  simplicity) 


- c(u.,x)  > gj  - c(Uj,x)  ¥ j/i  and  x€[x.,x.^j) 


The  following  proposition  states  that  if  people  whose  ability  levels 
are  at  a breakpoint  x.  are  indifferent  between  the  educational  levels 
u.  and  u.  j,  then  yj  - lor  all  x except  the  breakpoints. 
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PROPOSITION  2.1:  If 


g.  - c(u.,x.)  = g._j  - c(u._j,x.)  ¥ i=l,...,N-l 


(2.6) 


then 


g.  - c(u.,x)  > gj  - c(Uj,x) 


(2.  7) 


for  all  j / i and  for  all  x € (x.,x.^j) 


Proof.  From  assumptions  i')  and  ii'), 


c(u.,x)  - c(u.^j,x)  < c(u.,x.^j)  - c(u.^j,x.^j)  V x<  X.^j 


From  (2.  6), 


8i=  «i+l  - ^K+l>*i+l)  +‘=K'*i+l) 


Then 

gj  - c(u.,x)  >g.^i  - c(u.^px)  ¥ x<  X.^j 
For  all  X € (x^,x^^j)  , 

g.  - c(u.,x)>g.^j  - c(u.^j,x)>g.^l  - c(u.^2»*)  ••• 
so  that 

g.  - c(u^,x)  > gj  - c(Uj,x)  ¥ j > i . 

Similarly, 

g.  - c(u.,x)  >g._j  - c(u._j,x)  ¥ x>x. 


! 

■i 

j 
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implies 


g.  - c(u.,x)  > gj  - c(Uj,x)  V j<i,  ¥x€(x.,x.^j)  . Q.  E.  D 

The  following  corollary  stetes  that  the  indifference  condition  (2.6) 
implies  that  at  the  breakpoint  x.,  and  are  preferred  overall 


other  signals. 


COROLLARY  2.  1.  Given  (2.  6), 


- c(u^,x^)  > gj  - c(Uj,x^)  V j ^ i-l,i,  i - 1, . . . , N-1 


(2.8) 


Proof.  Suppose  there  exists  j / i - l,i  such  that 

a - cla  x)>  e - c(u.,x.).  If  j > i,  then  by  assumptions  i')  and  ii'), 
6j  ' j’  i ®i  i’  1 

-c(u^,x.)  + c(Uj,x^)  > - c(u^,x)  + c(Uj,x) 


so  that 


1.  - c(u.,x)  > g.  - c(u.,x)  ¥ X € (Xi,Xi+i>  , 


which  contradicts  (2.7).  Similarly,  if  j<  i - 1,  then 


g.  . c(u.,x)  > gi_i  - c(u._j,x)  ¥ X e (x._i,x.) 


which  also  contradicts  (2.  7). 


O.  E.  O. 


Therefore,  if  (2.  6)  holds,  and  if  we  define  Yj  at  the  breakpoints 


Yj(x.)  = u.,  i = 0,...,N-l  and  1 


(2.  9) 
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then  -y  j s "Yj  for  all  x € [xq,Xj^]  . 

Thus,  for  fixed  levels  of  education  (u),  "optimizing"  means 
choosing  the  breakpoints  {x.J,  what  the  employee  population  as  a 
whole  should  do,  and  wage  levels  (v.}  , what  the  employer  should  do. 
We  have  shown  that  the  necessary  and  sufficient  conditions  for  opti- 
mality reduce  to  the  set  of  equalities  (2.  6)  and  inequalities  (2.4) 
involving  the  xJs  v^'s.  The  inequalities  say  that  the  break- 

points should  be  "in  order".  The  equalities  say  that  if  the  people  whose 
ability  level  is  a breakpoint,  say  x^,  are  indifferent  between  choosing 
educational  level  u^  and  receiving  wage  v^,  and  choosing  u^  and 
receiving  v^,  then  the  system  is  in  equilibrium  and  people  are  paid 
equal  to  the  expected  productivity  of  their  particular  signaling  group. 
We  have  reduced  the  Nash  equilibrium  of  a two-person  decision 
problem  to  a feasible  solution  of  equalities  and  inequalities.  Equations 
(2.6)  provide  an  explicit  method  for  computing  the  equilibria.  If  the 
uJs  are  varied  or  if  the  number  of  signals  N is  changed,  then  there 
are  multiple  equilibria.  But,  if  as  mentioned  earlier,  the  signals  are 
fixed,  then  there  are,  in  general,  no  multiple  solutions. 


This  problem  is  different  from  those  of  Section  2.  In  Examples  2.  1 
and  2.  2,  u was  found  for  each  individual  x.  Here,  the  entire 
employee  population  is  considered  in  determining  where  the  break- 
points should  be,  and  thus,  what  signals  should  be  chosen. 

These  conditions  are  clearly  sufficient  for  optimality.  However,  they 
are  necessary  for  optimality  only  in  the  class  of  solutions  we  have 
guessed,  namely,  many-to-one  in  the  manner  of  Figures  2.  3 and 
2.4. 
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EXAMPLE  2. 3:  Let 

c(u,x)  = u/x  p(x)  uniform  over  [xp,Xj^] 

s(u,x)  = X N=3 

From  the  definition  of  at  equilibrium  given  by  (2.  5), 


X.  +x.., 

1 1+1  - ft  1 •» 

V.  = 2 » 1 = 0»  2 


From  (2.  6) 


^ - ^-1 

*i  V.  - V.  , ’ 


i=  1,  2 


Combining  these,  we  have  the  equilibrium  conditions 


^<“1  - 
*2  ■ *0 


(2.  10a) 


^(>^2  - 
X3  -xi 


(2.  10b) 


Conditions  (2.  10)  depend  on  u only  through  the  differences  u^  - u^ 
and  u^  - u^,  since  c is  linear  in  u and  s is  independent  of  u. 

If  Xq  = 1,  X3  = 2.  5,  Uj  - Up  = U2  - Uj  = 1,  then  the  pair 
(xj,X2)  satisfying  (2.  10)  and  (2.4)  is 


The  pair  (-3.1,  . 36)  also  satisfies  (2.  10)  but  not  (2.  4).  In  every 
example  we  tried,  only  one  of  the  pairs  satisfying  (2.  6)  also  satisfied 
(2.4),  but  we  have  not  ruled  out  the  possibility  that  both  pairs  might 
be  solutions. 


1 
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(i(V^-3),  (1.6,  2.24)  . 

The  title  of  this  section  promises  multiple  equilibria,  but  here  we  have 
a unique  equilibrium.  This  occurs  because,  as  mentioned  above,  we 
assume  u^,  u^,  and  u^  are  fixed. 

Table  2.2  gives  numerical  results  for  the  cases  N = 2,3,4. 
Figures  2.  5 and  2.  6 illustrate  that  for  N = 4,  and  already 

beginning  to  look  like  the  square  and  square  root  functions, 
respectively,  which  are  the  solutions  to  Example  2.  1,  Spence's 
continuous  one-to-one  case. 

Other  functions  of  c and  s,  and  other  probability  densities 
p(x),  such  as  the  Gaussian  distribution,  also  produce  new  classes  of 
multiple  equilibria  for  different  values  of  N,  but  the  details  are 
omitted  here. 

TABLE  2. 2 

Numerical  Examples  for  the  Uniform  Distribution 


Uq,  Uj,  . . . 


3 


1,  1.33,  2.5 
1,  1.6,  2.24,  2.5 
1,  1.6,  2.23,  2.6,  3 


0,  1 
0,  1,  2 
0,  1,  2,  3 


1.  17,  1.92 
1.3,  1.92,  2.37 
1.  3,  1.92,  2.42,  2.8 
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5.  Adjustment  Procedure  and  Stability 

An  important  question  to  ask  about  these  new  equilibrium 
classes  is  whether  they  are  stable.  That  is,  if  the  system  is  not  in 
equilibrium,  will  it  return  to  equilibrium.  In  order  to  answer  this 
question,  an  adjustment  procedure  must  be  outlined  for  each  decision 
maker,  describing  how  he  would  react  if  the  system  were  in  disequi- 
librium. Then  we  can  see  if  these  actions  bring  all  of  the  OMs  back 
to  the  equilibrium. 

A reasonable  adjustment  scheme  for  the  employer  is  the  one 
Spence  proposes  in  his  definition  of  equilibrium.  He  states  that  the 
employer  always  pays  wages  equal  to  the  expected  productivity,  based 
on  the  statistical  data  revealed  by  the  previous  employee  population. 
The  data  the  employer  observes  after  the  employees  are  hired  are 
just  the  breakpoints,  that  is,  which  range  of  abilities  choose  which 
signal.  Thus,  he  uses  the  data  he  observed  in  the  past  stage  to  make 
his  estimate  in  the  current  stage.  This  is  written 


v.(t) 


g.(x.(t),  x.^j(t))  = K,  [s(u.,x)] 


/. 


Xi+l(t) 


X.(t) 


s(u.,x)p(x)  dx 


F(x.^j(t))  - F(x.(t)) 


(2.11) 


We  call  this  "full  equilibrium  adjustment",  because  (2.  11)  is  just  the 
equilibrium  condition  (2.  5).  That  is,  the  objective  function  is 
minimized  at  each  stage. 


The  adjustments  that  the  employee  population  makes  to  the 
change  in  wage  schedules  can  be  argued,  on  the  other  hand,  to  be 
more  gradual  or  infinitesimal.  After  the  employer  has  adjusted  his 
wage  schedule,  the  individuals  at  the  breakpoints,  who  were  once 
indifferent,  now  have  a clear  choice  as  to  which  signal  they  prefer. 

This  is  reflected  in  the  shifting  of  the  breakpoints.  People  on  one 
side  of  the  breakpoint  slowly  drift  over  to  the  other  side  as  they  learn 
how  to  respond  to  the  wage  schedule.  The  net  result  can  be  modeled 
by  a set  of  steepest  ascent  equations  for  the  breakpoints. 


x.(t) 


(2.  12) 


where  e > 0 is  a constant  defining  the  infinitesimal  incremental  step. 
In  other  words,  no  single  individual  changes  x.;  the  shift  is  due  to  the 
combined  action  of  the  entire  employee  population.  We  call  (2.  12) 
"partial  equilibrium  adjustment"  because,  although  each  step  moves 
in  the  direction  of  maximizing  is  not  actually  maximized  at 

each  stage.  This  defines  the  other  half  of  the  adjustment  procedure. 

Substituting  (2.  11)  into  (2.  12)  results  in  a set  of  differential 
equations 

X.  - 6.(xj,...,  i • 


This  has  reduced  the  problem  of  adjustment  of  individual  actions  to  the 
question  of  stability  of  a set  of  differential  equations.  The  stability 
result  we  need  is  a version  of  a Lyapunov-type  stability  theorem  due  to 
Malishevskii  [6].  His  study  of  stability  of  individual  actions  in 


2-25 


goal-oriented  behavior  is  in  the  same  spirit  as  our  problem.  His 
theorem,  restated  in  general  terms,  is  as  follows: 


THEOREM  (Lyapunov-Malishevskii):  Let  x.  = 6.(x),  where 

T n 

X = (x,, . . . ,x  ) is  an  element  in  some  domain  D c R . Define  the 
n 


matrix 


'T'  ^ 

If  A + A < 0 for  all  x in  D and  if  the  equilibrium  point  x 

(where  x satisfies  ) = 0 for  all  i)  exists  in  D,  then  any 

♦ 

trajectory  x(t)  which  remains  in  D converges  to  x (uniform 
asymptotic  global  stability). 

The  following  example  is  an  illustration  of  how  this  theorem  can 
be  applied  to  the  job  market  model.  From  the  definition  of  Jj, 


Jl  = E[v2  - c]  = 


N-1  ^4 

E 1 

k=0 


- = [v._j  - C(u._pX.)  - (v.  - c(u.,x.))]  p(x.)  . 


EXAMPLE:  Consider  Example  2.  3 above,  and  let  e'  represent 


ep(Xj)  = €/(xj^-Xq).  Then 


X.  + X.,  I 
1 1+1 


1 


pill* 
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Let  D = {(xj,X2):xj  >0,  x^  > 0}.  By  inspection,  it  is  clear  that  the 
trajectory  x(t)  = (xj(t),  X2(t))  defined  by  6j  and  remains  in  D. 

A + = e' 

Therefore,  in  D,  x(t) -» x s:  (1.6,  2.24),  independent  of  e'.  This 
is  as  it  should  be,  since  one  cannot  in  general  be  certain  of  the  value 
of  e.  The  above  procedure,  in  fact,  also  holds  for  arbitrary  discrete 
levels  of  signals;  that  is,  it  is  independent  of  N. 

This  is  not  the  whole  story,  however,  because  the  breakpoints 
still  must  satisfy  (2.4),  that  is,  be  "in  order".  The  previous  example 
can  serve  to  describe  what  might  happen  before  the  equilibrium  is 
reached.  The  stability  result  says  that  x(t)  will  converge  to 
(xj,X2).  But  the  extra  constraint  of  order  defines  a region  where 
Xq  < Xj  < x^  < Xj  which  we  call  the  "feasible  region"  (FR),  as  shown 
in  Figure  2.  7.  Even  if  a trajectory  starts  inside  this  region,  it  may 
leave  the  region  before  it  reaches  the  equilibrium  point,  as  illustrated 
by  the  dotted  curve  in  Figure  2.  7.  If  this  happens,  it  means  that  two 
breakpoints,  or  a breakpoint  and  an  endpoint,  have  coalesced.  One  of 


2 

T 

Xo 


<0  in  D 


. ] 
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CIRCLE  OF 


FIG.  2.7  STABILITY  FOR  EXAMPLE  2.3 
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the  breakpoints  has  disappeared,  meaning  that  one  of  the  signals  is  no 
longer  being  chosen  by  any  individuals.  The  system  then  drops  back 
down  to  the  next  lower  number  of  signals.  However,  there  is  always 
a circle  of  initial  points  where  convergence  is  guaranteed,  because 
Malishevskii  proves  that  the  norm  of  the  vector  x(t)  monotonically 
decreases.  This  circle  is  defined  as  follows: 

Let  d*  = minimum  distimce  (in  the  Euclidean  f2  norm)  between 
X*  and  the  boundary  of  FR.  Then  the  circle  of  guaranteed  conver- 
gence = (x  € FR  : ||x-x*||  ^ d*} . Thus,  what  started  out  as  a global 
stability  result  is  actually  a sort  of  local  stability  result,  since  con- 
vergence for  this  problem  is  guaranteed  only  locally. 

Another  possibility  is  that  the  equilibrium  point  itself  is  not 
in  FR,  as  shown  in  the  next  example. 

EXAMPLE  2.4:  Consider  Example  2.3,  but  with  N = 4,  and 


Then  the  solution  to  the  equilibrium  equality  conditions  with  all 
positive  components 

(xj,  X*,  X*)  = (2.2,  1.9,  3.3) 

does  not  satisfy  the  "order"  condition  (2.4).  This  phenomenon  does 
not  depend  on  N,  the  number  of  levels.  Table  2.  2,  Figure  2.  5 and 


I 


r 
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Figure  2.  6 demonstrate  an  example  where  the  equilibrium  break- 
points for  N = 4 ^ lie  inside  the  feasible  region. 

6.  Competition  and  Specialization 
6.  1 Specialization 

So  far,  we  have  assumed  that  the  employer  hires  people  of  all 
abiUties  in  [xq,Xj^].  Spence  [3]  and  Riley  [4]  show  how  specialization; 
i.  e.  , hiring  people  of  only  some  abilities,  can  lead  to  nonexistence  of 
an  equilibrium  when  competition  from  other  employers  is  brought  in 
explicitly.  In  particular,  Spence  states  that  with  specialization  (1)  a 
one-step  wage  schedule  (N  = 1 in  our  notation)  definitely  cannot  be 
an  equilibrium,  and  (2)  if  this  one-step  schedule  is  preferred  by  all 
employees  to  multistep  (or  continuous)  schedules,  then  there  is  no 
equilibrium  in  the  market.  We  will  show  that  this  last  conclusion  also 
holds  for  our  equilibrium  classes.  However,  we  will  also  show  in 
Section  6.  2 that  the  nonexistence  of  an  equilibrium  can  be  partially 
resolved  through  use  of  the  criterion  function  J^* 

The  argument  for  (1)  proceeds  by  way  of  Example  2.3  sumiTiarized 
as  the  N = 3 case  in  Table  2.  2.  The  only  available  signals  are 
u = 0,  1,  2.  Suppose  an  employer  ignores  the  last  two  signals  and 
offers  the  one-step  schedule 

y^lu)  = E[x]  = =1.75  V ui  0 

as  shown  in  Figure  2.  8.  Liet 


FIG.  2.8  SPECIALIZATION 


I 
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p = optimal  net  profit  of  individual  of  type  x 

* * 

= ^2^“  ) - 

where 

u*  = arg  max  " c(u, x)] 

u 

To  express  c(u,  x)  as  a function  of  u for  fixed  x,  we  use  instead  the 
notation  c (u),  where 

X 

c (u)  = — , for  this  example. 

If  ^2^“^  “ U*  = 0 and  = 1.  75  for  all  x. 

Figure  2.8  shows  the  cost  curve  (line,  in  this  case)  Cj  and  the  cost 
curve  shifted  by  the  optimal  net  profit,  Cj  + pj.  The  line  c^  + p^ 
is  the  indifference  line  for  a person  of  ability  x;  that  is,  any  wage 
offered  along  this  line  gives  net  profit  p^  = 1.  75.  Thus,  any  wage  in 
the  region  above  the  line  is  preferred  to  the  original  one- step  schedule 
since  the  net  profit  is  greater  than  p^. 

Suppose  an  employer  who  was  able  to  specialize  offered  the 
schedule  in  Figure  2.8  shown  by  the  dotted  line;  that  is, 

10  , 0 s u < 1 

2.23  , u a 1 

Then,  by  the  previous  argument,  all  persons  of  ability  types 
X € [2.  1,  2.  5]  will  prefer  this  new  schedule.  Since  their  average 
productivi*-y  is  2.  3,  which  is  greater  than  the  wage  2.  23,  the  employer 
makes  a profit.  The  original  one -step  equilibrium  is  destroyed,  and 


Spence's  first  statement  is  demonstrated  for  this  example.  Whereas 
Spence  demonstrated  this  by  allowing  the  employer  to  change  the 
discrete  signaling  levels  u,  we  have  demonstrated  this  without 
creating  new  signals,  just  ignoring  some  of  the  already  existing 
signals. 

To  demonstrate  the  second  statement,  we  must  show  how  it  can 
happen  that  all  individuals  prefer  the  one- step  (N  = 1)  wage  schedule. 
For  this  example,  since  there  are  three  signals,  there  are  three 
possible  equilibrium  wage  schedules,  corresponding  to  N = 1,  N = 2, 
and  N = 3.  Figure  2.9  shows  the  N = 3 and  N = 1 wage  schedules. 
Wages  in  region  R are  preferred  over  the  N = 3 schedule  by  those 
people  whose  abilities  are  at  the  endpoints  of  the  ability  interval, 
namely  x = 1 and  2.  5.  The  following  argument  shows  that  this 
region  is  also  preferred  by  all  abilities  inbetween  as  well.  All  other 
shifted  cost  lines  are  also  indifference  lines  and  so  must,  by  con- 
struction, p>ass  through  one  of  the  points  A,  B,  or  C in  Figure  2.9 
with  slopes  between  that  of  Cj  + Pj  and  c^  5 5>  “am®ly>  1 

and  2/5.  Thus,  none  of  these  cost  lines  will  intersect  the  region  R, 
so  that  the  one-step  wage  schedule  is  preferred  by  everyone  to  the 
N = 3 schedule.  Thus,  everyone  chooses  Uq  = 0 and  signaling 
ceases.  Graphically,  we  see  that  this  is  true  when  the  intersection  of 
the  expected  productivity  line  given  by  v = E(x)  = (Xq  + with  the 

v-axis  lies  above  that  of  the  line  c -I-  p . More  precisely,  this 

*N  *N 

condition  can  be  stated  as: 
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FIG.  2.9  ONE-STEP  VS.  THREE-STEP  WAGE  SCHEDULE 
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PROPOSITION  2.2.  Consider  the  equilibrium  N-step  schedule 
of  Example  2.  3.  If 


2 *N 


"N-1 


"N 


then  the  one- step  equilibrium  schedule 


*0  ^ 

V = V2(“)  = - 2 — ^ “ 


is  preferred  to  the  N-step  schedule. 

A simple  calculation  shows  that  the  N = 2 case  in  Table  2.  2 
also  satisfies  this  condition.  Therefore,  the  one-step  schedule  is 
preferred  to  all  possible  N > 1 equilibrium  schedules  with  the  given 
signals  and  parameter  specifications.  If  specialization  is  allowed,  we 
can  then  conclude  that,  for  this  particular  example,  there  is  no  equi- 
librium in  the  class  of  multistep  solutions. 


6. 2 Competition 

The  whole  problem  of  nonexistence  of  equilibria  from  the 
previous  section  rests  on  the  premise  that  an  employer  will  offer  a 
one-step  wage  schedule  when  it  is  preferable  to  the  employees,  in 
order  to  compete  with  other  employers.  However,  in  analyzing  this 
competition,  we  have  considered  Jj,  but  have  totally  neglected  J^- 
The  outcome  is  the  rather  nonintuitive  result  that  people  of  the 
highest  ability  would  sometimes  prefer  to  be  paid  the  same  as  people 
of  the  lowest  ability.  This,  it  would  seem,  would  lead  to  much  job 


n.. 


dissatisfaction.  Of  course,  if  the  number  of  signaling  levels  were 
fixed,  or  there  were  no  other  employers,  then  the  individual  would 
have  no  choice  but  to  maximize  Jj  over  the  available  signals.  But 
in  the  situation  described  above,  neither  of  these  is  the  case.  The 
actual  signals  are  exogenously  given  but  the  number  of  signals  to  be 
used  is  not.  A persuasive  case  might  be  made  that  an  employer 
offering  a more  differentiated  wage  schedule  (e.  g.  , N = 3)  might  very 
well  be  preferred  by  the  employee  population  than  one  who  offers  the 
one-step  schedule,  even  though  the  latter  schedule  makes  more  pure 
economic  sense  to  the  employees  from  the  viewpoint  of  J|.  However, 
people  are  also  concerned  with  being  paid  nearer  to  what  they  are 
worth.  We  submit  that  our  J^,  the  mean  square  criterion,  is  an 
attempt  to  capture  this  effect.  To  justify  this  conclusion  in  our  decision- 
theoretic  framework,  we  must  show  that  the  value  of  for  the  one- 
step  schedule  is  larger  than  the  value  for  the  multi-step  schedule; 
that  is,  the  one-step  schedule  is  less  preferable  to  the  employer  by 
being  less  competitive.  This  is,  in  fact,  the  case  for  the  example  in 
Figure  2.9  where  J2(N  = 1)  = .1875  and  J2(N  = 3)=  .0278.*  In  other 
words,  if  additional  signals  are  available,  then  there  is,  in  general, 
an  incentive  for  an  employer  to  offer  a finer  schedule  when  it  leads  to 
a better  J^.  A logical  consequence  of  this  argument  is  that  the 


A simple  calculation  shows  that  for  c = u/x,  s = x,  and  p(x)  uniform 


'2  ’ 12(*n-V  i?o  • 


employer  would  prefer  more  and  more  signals  to  differentiate  people 
of  different  abilities  until  Spence's  continuous  equilibrium  is  reached 
(as  in  Example  2.  1),  where  every  ability  level  is  paid  its  productivity, 
and  = ^[(^2  - s)^]  = 0. 

If  this  is  true,  then  why  don't  we  see  employers  constantly 
creating  new  signals  in  hopes  of  attaining  the  continuous  one-to-one 
equilibrium.  First  of  all,  we  argued  above  that  it  is  very  difficult  to 
create  new  educational  levels,  and  assumed  that  the  signals  were  fixed 
exogenously  when  an  employer  entered  the  market.  Secondly,  adding 
new  signals  does  not  necessarily  improve  H since  we  have  the 
constraints  that  the  old  signals  cannot  be  easily  discarded  and  the 
breakpoints  must  be  in  order.  For  example,  if  the  new  signal  u^  = 2.25 
is  introduced  for  the  case  N = 3 in  Figure  2.  9,  and  if  the  employees 
and  employers  adjust  so  as  to  settle  down  at  a new  equilibrium  at  the 
N = 4 level  (see  Section  5 on  adjustment  and  stability),  then 
^2^^  = 4)  = . 0355  > . 0278  = >^2^^  “ ^ fact,  it  can  be  shown  that  any 

Uj  > 2 which  produces  an  equilibrium  solution  in  the  feasible  region 
yields  a J2  value  greater  than  J2(N  =3).  Thirdly,  for  each  new 
signal  there  are  attendant  costs  of  transmission  and  administration.  In 
a sufficiently  differentiated  wage  schedule,  these  second  order  costs 
must  be  accounted  for  and  traded  against  the  advantages  of  new 
signals.  Consequently,  we  do  not  see  the  constant  creation  of  new 
signals  in  the  short  run  nor  the  eventual  infinite  differentiation  of  wage 
schedules  in  the  long  run. 
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7.  Threshold  Effects 

7.  1 Introduction 

Tn  Section  5 we  observed  that,  under  certain  circumstances, 
signals  disappeared.  This  phenomenon  leads  us  to  the  question  of 
whether  changing  the  parameters  of  the  problem  also  causes  signals 
to  disappear.  In  this  section,  we  show  that  not  only  do  signals  dis- 
appear, but  also  signaling  ceases  altogether  when  the  parameters 
cross  threshold  points.  Three  types  of  parameters  will  be  studied 
to  see  if  they  exhibit  threshold  effects.  We  will  investigate  whether 
signaling  ceases  when  1)  signaling  costs  get  too  high,  2)  the  variance 
of  the  unknown  state  of  the  world  x gets  too  small,  and  3)  signaling 
noise  gets  too  large. 


7.  2 Signaling  Cost 

The  first  type  of  parameter  to  be  studied  is  one  affecting  the 
cost  of  signaling.  To  illustrate  this,  we  modify  the  payoff  function  for 
the  employee  population  to 

= E[y^-ac]  , (2.13) 


where  a > 0 is  a scalar  cost  parameter.  For  simplicity,  the  argu 
ment  will  proceed  by  way  of  Example  2.  3 for  N I 3.  With  the  new 
payoff  Jj  from  (2.  13),  equilibrium  conditions  (2.  10)  become 


X. 

1 


2or(u.-u.  _ j) 

*i+r*i-i 


(2.  14) 
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Since  0 < Xq  < *i+l  ^ ’ 


1 ' 


*i+l'^-l  *N'^0 


and  so 


X.  > 
1 


2a(u. -u.  ,) 
1 1-1 


“N 


i = 1, . , . ,N-1 


(2.  15) 


As  a increases  . condition  (2.  15)  will  eventually  be  violated  for  some 
i.  This  implies  that  a breakpoint  has  coalesced  with  another  break- 
point or  an  endpoint  (Xq  or  x^^).  As  a particular  illustration  of  this, 
consider  the  case  of  N = 3,  as  shown  in  Figure  2.  10.  It  can  be  shown 
explicitly  for  this  example  that  as  a increases,  the  breakpoints  Xj 
and  X2  move  away  from  each  other  towards  the  endpoints.  More  and 
more  people  choose  the  signal  Uj.  At  first  it  may  seem  strange  that 
as  signaling  costs  increase,  fewer  people  choose  the  cheapest  signal 
Uq.  To  understand  this,  we  must  also  look  at  how  the  wages  are 
changing.  First  of  all,  (2.  15)  can  also  be  written  as 


X.  = 

1 


Q,(U.  - U._j) 


V.  - V.  , 
1 1-1 


(2.  16) 


since 


*2  - *0 


'^1  ■ '^0  " 2 


and 


X3  - Xj 


^2  ■ '"1  ^ 2 


I 


J 


2- 


Thus,  when  Xj  decreases  and  x^  increases,  both  Vj  - and 
- Vj  also  increase.  We  can  deduce  from  (2.  16)  that  costs  a 
increase  faster  than  V2  - Vj  but  slower  than  Vj  - v^.  Therefore, 
higher  ability  people  switch  to  u^  because  their  costs  are  rising 
faster  than  their  relative  wages,  and  lower  ability  people  switch  to  Uj 
for  the  opposite  reason. 

Going  back  to  Figure  2.  10,  it  can  be  shown  that  x^  reaches 

X*.  ^ before  x,  reaches  x_.  This  means  that  no  one  chooses  the 
N=3  1 0 

signal  u^  anymore,  and  the  system  drops  down  to  the  next  lower 
level,  namely,  two  signals  and  one  breakpoint.  This  is  the  eventual 
t outcome  from  increasing  a,  regardless  of  how  many  signals  there 

were  at  the  start.  (2.  14)  now  becomes 


2a(ui  - Uq) 


(2.  17) 


where  the  two  remaining  signals  are  labeled  Uq  and  u^.  As  ct 
increases,  Xj  clearly  increases  until  it  coalesces  with  Xj^,  at  which 
point  everyone  chooses  the  cheaper  signal  Uq  and  receives  the  wage 
equal  to  the  unconditional  expected  productivity  (x  in  this  example). 
Therefore,  signaling  disappears  when  the  cost  parameter  a exceeds 
a certain  threshold.  This  result  agrees  with  the  intuitive  notion  that 
as  signaling  costs  rise,  it  is  no  longer  worthwhile  to  invest  in  the 
higher  educational  levels. 

Another  threshold  effect  occurs  if  a is  decreased.  In  this  case, 
the  breakpoints  will  move  in  the  opposite  directions  and  coalesce  in  the 
opposite  order  as  before.  The  N = 2 case  will  again  eventually  be 
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reached,  but  now  Xj  will  coalesce  with  Xq,  not  Xj^.  Signaling  costs 
get  so  low  that  even  the  lowest  ability  types  choose  to  pay  a little  more 
for  Uj  and  receive  the  higher  wage  Vj.  We  still  have  "no  signaling,  " 
but  this  solution  is  inefficient  because  the  employees  are  overin- 
vesting in  the  signal.  If  they  all  signaled  Uq,  they  could  still  receive 
•^2  = X but  a higher  Jj.  However,  this  solution  can  be  considered  an 
equilibrium  if  we  assume  that  each  individual  maximizes  his  own  net 
profit  based  on  the  current  wages,  and  that  there  is  no  central  force 
(e.g.  , a union)  deciding  what  is  best  for  the  employee  population  as  a 
whole. 


7.  3 Variability  in  the  Unknown 

Another  type  of  threshold  occurs  when  the  variability  of  the 
unknown  x,  the  underlying  signal,  changes.  One  such  parameter  is 
the  variance  of  x,  which  is  proportional  to  x^^  - x^  in  the  uniform 
distribution  case  of  Ebcample  2.  3.  Since  - Xq  occurs  in  the 
denominator  of  the  expressions  in  (2.  15)  and  (2.  17),  decreasing 
Xj^  - x^  for  fixed  a has  exactly  the  same  effect  as  increasing  a for 
fixed  Xj^  - Xq.  Again,  the  breakpoints  shift  and  coalesce  until  every- 
one chooses  Uq,  and  signaling  disappears.  This  result  has  several 
intuitive  explanations.  First  of  all,  it  means  that  as  people  become 
more  homogeneous,  it  becomes  less  important  to  differentiate  them. 

In  other  words,  the  information  to  be  sent  through  the  signal  is  less 
worthy  of  much  effort.  To  see  the  second  meaning,  we  mupc  obser  /e 
how  the  wages  are  changing.  Since 


the  difference  in  wages  also  decreases.  The  wages  would  eventually  be 
close  enough  so  that  even  the  individual  of  highest  ability  might  just  as 
well  take  the  lower  (and  cheaper)  signal  u^,  since  he  cannot  receive 
a significantly  higher  wage  by  choosing  the  higher  signal. 

However,  changing  - x^  is  more  complicated  than  changing 
a because  not  only  do  the  breakpoints  move,  but  so  do  the  endpoints. 

If  Xq  increases,  then  it  may  catch  up  with  Xj  before  Xj  coalesces 
with  Xj^.  In  this  case,  everyone  would  choose  the  higher  signal  Uj. 
To  understand  the  circumstances  under  which  all  people  choose  the 
more  expensiv  ; signal,  even  though  the  difference  in  wages  is  still 
decreasing,  we  must  analyze  three  separate  cases  where  Xj^.  - Xq 
is  decreased  by  shrinking  the  interval  [xq,Xj^]  . 

CASE  1:  Xq  fixed  and  Xj^  decreased 

Since  Xq  is  fixed,  it  cannot  catch  up  with  Xj,  so  that  Xj 
coalesces  with  Xj^  . 

♦ 

CASE  2:  Xq  increased  and  x^^  decreased  at  the  same  rate 

If  every  time  x^  is  decreased  by  A,  Xj^  is  increased  by  A, 
then  X stays  constant  and  always  equals  (Xq  + x^)/2.  Then  from 
(2.  18),  Vj  - Vq  = X - Xq.  In  order  for  x^  to  prefer  u^  (and  thus 
maintain  two-level  signaling),  we  must  have 

Similar  analyses  can  be  done  if  x^  is  increased  and  decreased 

at  different  rates. 


I 
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or 


Xo(i-Xo> 


1 

i 

i 

! 

< 

(2.  19)  1 


i 

; 


A graphical  description  of  the  right  hand  side  of  (2.  19)  is  shown  in 
Figure  2.  11.  If  Uj  - > i^/4,  or  if  Uj  - u^  < x^/4  and  the  initial 

X >x",*  then  (2.  19)  holds  and  Xj  eventually  coalesces  with  Xj^ 
as  Xq  increases.  If,  'on  the  other  hand,  Uj  - Uq  < x /4  and  x^  < x^, 
then  the  right  hand  side  of  (2.  19)  increases  until  (2.  19)  is  violated  and 
Xj  coalesces  with  x^.  **  The  intuitive  reasons  for  this  are  twofold. 
First  of  all,  Uj  - u^  must  be  small  enough  so  that  there  is  not  so 
much  difference  in  cost  between  the  signals.  Secondly,  x^  must  be 
sufficiently  small,  so  that  the  average  productivity  (also  wage)  for  the 
lower  group,  namely  Vq,  is  then  much  smaller  than  Vj,  so  that  by 
the  time  v^  comes  close  to  Vj,  the  lowest  ability  group  has  already 
decided  that  Vj  is  enough  of  an  inducement  to  choose  Uj. 

CASE  3:  Xq  increased  and  x^^  fixed 

From  (2.  18),  two-level  signaling  continues  if 


2(Uj  - Uq)  > Xq(Xj^  - Xq) 


(2.20) 


*“As  shown  in  Figure  2.  11,  x^  and  x^'  are  defined  as 
x''(i-x^')  = X*  (i-x^)  = Uj  - Uq, 


TC  < X" 
*0  *0  • 


**If  Xq  < x|j,  then  initially  Xj  = (Uj-Uq)/(x-Xq)  < x^  = (hj-Uq)/(x-x'). 
When  Xq  catches  up  with  Xj  at  the  value  x^,  Xj  cannot  have 
coalesced  already  with  x^^,  because  Xq  < x < Xj^  . 


i 


\ 


\ 
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A graphical  description  of  the  right  hand  side  of  (2.20)  is  shown  in 
Figure  2.  12.  By  an  analysis  similar  to  that  of  Case  2,  Xj  coalesces 
with  unless  2(Uj-Uq)  < Xj^/4  and  x^  < x^.  The  same  intuitive 
arguments  hold  as  in  Case  2. 

The  conclusion  from  the  preceding  discussion  is  that  decreasing 
the  range  [xq,Xj^]  by  increasing  Xq,  decreasing  Xj^,  or  both  results 
in  "no  signaling.  " The  parameters  of  the  problem  determine  vdiether 
the  employees  chose  the  higher  or  lower  signal. 

A similar  analysis  in  Appendix  II-A  describes  what  happens 
when  Xj^  - x^  increases.  In  general,  Xj  will  decrease  and  coalesce 
with  Xq,  resulting  in  "no  signaling.  " However,  depending  on  other 
parameters,  x^  may  decrease  faster  than  Xj,  so  that  Xj  never 
catches  up  to  Xq.  Differentiated  signaling  continues  until  Xq  reaches 
zero  (recall  that  was  assumed  to  be  positive). 

7.  4 Signaling  Noise 

The  third  type  of  parameter  we  want  to  investigate  is  signaling 
noise.  Suppose  now  that  the  employer  has  a noisy  measurement  of 
education  and  observes  y = u e instead  of  u,  where  e is  the  noise. 
Then  his  strategy  is  a function  of  a noise-corrupted  signal: 

V = y^(y)  = Y2(u  + e)  = Y2^‘Vi(x)  + e) 

The  equilibrium  condition  remains  the  same  as  before,  that  is 


Y2(y)  = • 


(2.21) 


,11  HI.. p— IWIWPUPB^P— ipimamsiinnilipppppil 
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It  is  difficult  to  give  an  economic  interpretation  to  e . If  we  assume 
that  "educational  level"  reflects  a ranking  of  a composite  measure 
of  years  of  education,  performance,  courses,  and  quality  of  the  school, 
and  we  assume  that  this  ranking  is  known  to  employer  and  employees 
alike,  then  it  appears  that  "noise"  can  only  mean  interference  in  the 
communication  link  between  what  the  individual  does  and  what  the 
employer  observes.  But  if  the  job  application  form  is  complete  enough 
and  the  individuals  do  not  lie,  then  noise,  in  this  sense,  should  be 
eliminated.  However,  we  can  still  treat  e as  a purely  mathematical 
entity.  (In  the  next  chapter,  noise  will  play  a more  important  role.  ) 
Continuing  to  use  Example  2.  3 to  illustrate  the  main  ideas,  we 
assume  € has  a uniform  distribution  between  some  -b  and  b.  For 
the  case  of  N = 2,  the  employees'  strategy  Vj  remains  a step  function 
with  two  signaling  levels,  as  in  the  case  of  no  signaling  noise.  However, 
as  will  be  shown  next,  the  employer's  strategy  does  not  remain  a 
two-step  wage  schedule.  To  see  this,  refer  to  Figure  2.  13,  where  'Y2 
is  plotted  vs.  jr,  not  u.  Assuming  Uq  and  Uj  are  fixed  signals,  any 
y in  an  interval  of  ± b around  u^  and  Uj  could  be  observed  by  the 
employer.  If  a y between  u^  - b and  Uj  - b is  observed,  the 
employer  knows  that  u^  was  signaled,  so  that  x must  be  between  x^ 
and  Xj.  The  wage  v^  is  the  average  productivity  for  that  interval, 
namely,  (Xq  + Xj)/2  for  this  example.  Similar  arguments  can  be  made 
to  determine  Vj  and  v^,  as  shown  in  Figure  2.  12.  Therefore,  the 
two-step  wage  schedule  becomes  a three-step  schedule  if  the  noise  is 
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uniformly  distributed  and  there  is  s central  overlap  region  of  un- 
♦ 

certainty. 

Given  the  three -step  wage  schedule,  the  individuals  now  want  to 
mciximize  expected  net  profit,  where  the  expectation  is  taken  over  both 
X and  y.  This  means 


max 

ie{o, 1) 


+ '>1  ■ IT 


5 


so  that  the  equilibrium  condition  of  indifference  at  the  breakpoint  Xj 


IS 


'^0  Xj  '^1  Xj 


(2.  22) 


the  same  as  before  but  with  wages  Vq  and  Vj  replaced  by  expected 
wages  V.  = E / [y^iu.+e)].  More  precisely, 

1 / X u X 


= E/  h 


'^0  = 


tfl-b  ^0+b 

,(u  +€)]  = 1 VQp(y|x)  dy  + I Vjp(yjx) 

%-b 


dy 


Au  , 2b-Au 

2b  '"l  2b 


(2.23) 


where  Au  = Uj-u^.  Similarly 


„ r . . .1  2b-Au  . Au 

'^1  = = '^l  “2^  +'^2  lb 


(2.24) 


For  the  case  of  Gaussian  noise,  the  wage  schedule  is  a continuous 
function,  not  a step  function.  However,  the  breakpoint  equilibrium 
conditions  can  stiU  be  determined.  The  details  are  complicated,  and 
so  are  omitted  here. 
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Thus,  the  expected  wage  v.  is  just  a weighted  average  of  wages  v. 
and  From  (2.  23)  and  (2.  24),  (2.  22)  becomes 


^ _2 

2b  ■ Xj 


Au 

2b  ^2 


2b<'^2'^0) 


*2‘*0 


(Also,  we  must  have  *q  ^ *2'  ^ 


EXAMPLE  2.5.  Let 


b = .6, 


x^  = 2.5 


Then 


It  is  easy  to  see  from  (2.  25)  that  as  the  signal  becomes  noisier 
and  b increases,  Xj  increases  and,  in  general,  coalesces  with  x^. 
Everyone  chooses  the  cheaper  signal  u^.  Therefore,  as  we  would 
expect,  if  the  signal  is  too  noisy,  signaling  will  cease.  However,  x^ 
may  not  coalesce  with  x^*  Referring  to  Figure  2.  13,  we  see  that  as 
b increases,  the  interval  of  y's  which  are  paid  Vj  = x expands.  The 


HW  uii  . , . I ■■mwijj.iM 

I other  intervals  remain  constant  in  size,  but  the  left  interval  shifts 

to  the  left  and  the  right  interval  to  the  right.  Eventually,  negative 

values  of  y will  appear.  Since  negative  signals  do  not  make  sense  from 
J an  economic  point  of  view,  we  stop  increasing  b when  u^  - b = 0. 

l 

If  b increases  to  u_  before  x,  reaches  x_,  then  u_  - b = 0,  and  * • 

1 0 1 Z’  0 ’ 

k 

5 differentiated  signaling  remains  in  effect. 

! 

I If  b decreases.  Xj  decreases  and  coalesces  with  x^.  This 

, result  is  surprising,  since  it  says  that  when  the  noise  is  small  enough, 

^ everyone  chooses  the  more  expensive  signal.  To  understand  this,  we 

- must  again  look  at  how  the  difference  in  the  wages  is  changing: 

f '^l  ■ ""  Zb  ^'^2  ■ • i 

Thus,  as  b decreases,  Vj  - Vq  increases.  Eventually,  the  expected 
wage  Vj  will  be  large  enough,  so  that  choosing  the  higher  signal 
becomes  worthwhile. 


7.  5 Summary  of  Threshold  Effects 

The  results  of  this  section  will  now  be  summarized.  First  of 
all,  if  the  signaling  cost  parameter  a is  increased,  then  the  cost  of 
education  gets  too  high,  and  everyone  chooses  the  cheapest  signal.  If 
Of  is  decreased,  then  the  opposite  happens:  cost  of  education  becomes 
so  low  that  it  becomes  worthwhile  to  pay  a little  more  for  the  higher 
signal  and  receive  the  higher  wage. 


All  of  the  statements  are  in  reference  to  Example  2.  3. 


)!,  f ■ i- 
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Changing  the  variability  of  the  unknown  state  of  the  world  x is 
more  complicated  than  changing  or.  The  parameter  in  the  former 
case  is  the  length  Xj^  - x^  of  the  interval  [x^.x^^].  The  results  are 
summarized  in  Table  2.  3.  In  general,  if  the  variability  is  too  small, 
then  the  individuals  are  not  differentiated  enough  to  make  it  worthwhile 
for  them  to  signal,  so  that  everyone  chooses  the  cheaper  signal  Uq. 
However,  if  the  difference  in  the  wages  decreases  slower  than  the 
difference  in  the  costs  for  the  lowest  ability  group  Xq,  then  the  higher 
wage  is  enough  of  an  inducement  for  all  individuals,  even  the  lowest 
ability  types,  to  choose  the  more  expensive  signal.  If  the  variability 
increases,  we  would  expect  people  to  continue  signaling,  since  they 
are  becoming  more  dissimilar.  However,  again  the  results  depend  on 
how  the  difference  in  the  wages  is  changing.  If  it  is  increasing 
faster  than  the  difference  in  costs  for  the  lowest  ability  group,  then 
eventually  everyone  will  choose  the  higher  signal  in  order  to  receive 
the  higher  wage.  Therefore,  the  threshold  effects  for  the  variability 
of  X parameter  depend  on  other  parameters  of  the  problem. 

The  last  parameter  is  signaling  noise.  It  was  shown  that  if  the 
noise  is  too  high,  then  it  becomes  too  difficult  for  the  employer  to 
determine  the  ability  from  the  signal,  so  that  everyone  chooses  the 
cheaper  educational  level.  If  the  noise  is  small,  then  the  signal  is  a 
better  indication  of  ability,  but  a secondary  effect  takes  over.  The 
expected  wage  from  the  more  expensive  signal  is  sufficiently  high  to 
induce  everyone  to  choose  that  signal. 

The  details  for  increasing  Xj^  - Xq  are  discussed  in  Appendix  II- A. 


TABLE  2. 3 


Case 


Case 


Case 


Threshold  Effects  from  Changing  - x^ 


Decreasing 

^N'^0 

Increasing  x^^-x^  (see  Appen- 
dix n-A  for  details) 

1: 

X, 

X,  -♦  x^ 

1 N 

1 0 

Xq  fixed 

All  choose  Uq. 

All  choose  Uj 

2: 

^1  ” -^N  ’ 

Differentiated  signaling  con- 
tinues as  x„  -•  0, 

same 

rate 

unless  ~“q 

unless  Uj-Uq  sufficiently 

Xq  are  sufficiently 

small  and  Xq  sufficiently 

small  so  that  Vj-Vq 

large  so  that  increases 

decreases  slower 

faster  than 

than 

'^1"^0 

^0 

^0 

in  such  a way  that  catches 

up  with  Xq.  Then  x^  -•  x^ . 

in  such  a way  that 
Xq  catches  up  with 
Xj  before  Xj 
reaches  x^^^.  Then 

Xi  - Xq  . 

3: 

1 

1 

Same  argument  as 

Same  argument  as  in  Case  2. 

in  Case  2. 
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APPENDIX  II-A 

THRESHOLD  EFFECTS  FOR  INCREASING  VARIABILITY  OF  x 

Increasing  [xq,Xj^]  for  c = u/x,  p(x)  uniform 

The  arguments  in  this  section  are  completely  analogous  to  those 
in  Section  7,  3 and  will  refer  to  equations  and  figures  from  there. 

{See  Table  2.  3 for  a summary.  ) It  is  easy  to  see  from  (2.  17)  that  if 
x^  - Xq  is  increased,  then  x^  decreases  and,  in  general,  coalesces 
with  Xq.  Everyone  chooses  the  more  expensive  signal  Uj.  This  is 
because,  from  (2.  18),  Vj  - Vq  increases.  As  Vj^  becomes  signific- 
antly higher  than  v^,  it  eventually  becomes  worthwhile  for  even  the 
lowest  ability  group  to  choose  the  higher  signal.  However,  this  result 
again  depends  on  u^  - u^.  It  may  be  that  Vj  can  never  be  sufficiently 
large  to  attract  all  employees.  This  means  that  even  though  Xj 
decreases  as  Xj^  - Xq  increases,  Xq  decreases  faster  than  Xj.  Xj 
can  never  catch  up  to  x^,  so  that  differentiated  signaling  continues. 
Again,  we  must  consider  three  cases: 

CASE  lA:  Xq  fixed  and  x^^  increased 

Since  x^  is  fixed,  x^  can  catch  up  to  it. 

CASE  2A:  x^  decreased  and  x^  increased  at  the  same  rate 

Referring  to  Figure  2.11,  in  a manner  completely  analogous  to 

-Z  *z 

Case  2 in  Section  7.  3,  we  see  that  if  Uj  - u^  > x /4,  or  if  Uj  - »1q<  x /4 

*Since  Xq  >0,  the  system  must  be  stopped  before  x^  reaches  0. 


but  the  initial  Xq  < x^,  then  the  breakpoint  Xj  decreases  but  never 
catches  up  with  Xq.  The  employees  continue  to  signal  at  different 
levels.  If  Uj  - Uq  < X /4  and  Xq  > x^'  , then  as  Xq  decreases,  the 
right  hand  side  of  (2.  19)  increases  until  (2.  19)  is  violated,  Xj  coalesces 
with  Xq,  and  signaling  disappears.  Intuitively,  if  Uj  - u^  is 
sufficiently  small,  then  as  v^  - v^  increases,  it  will  eventually  be 
worthwhile  for  the  lowest  ability  group  to  pay  a little  more  for  Uj  and 
receive  the  much  higher  wage  Vj.  If  the  initial  Xq  is  sufficiently 
large,  then  c = a/x^  will  not  be  too  large,  so  that  it  will  again  be 
worthwhile  to  purchase  Uj. 

CASE  3A:  Xq  decreased  and  x^^  fixed 

Figure  2.  12  describes  the  situation  here.  Signaling  continues 
unless  2(Uj  - Uq)<x^/4  and  Xq  > x”,  in  which  case  Xj  coalesces 
with  Xq.  The  same  intuitive  arguments  hold  as  in  Case  2A. 

Thus,  in  the  case  of  increasing  the  range  [xq,Xj^],  signaling  may 
or  may  not  disappear,  depending  on  other  parameters  in  the  problem. 


CHAPTER  m 


SIGNALING  AND  INFORMATION  THEORY 
1.  Introduction 

In  the  previous  chapter,  an  exampi  of  a signaling  problem  was 
analyzed  in  the  context  of  economic  theory.  This  chapter  will  analyze 
signaling  in  the  context  of  Shannon  theory,  also  sometimes  referred  to 
as  classical  information  theory.  This  theory  forms  the  foundation  for 
the  following  standard  communication  problem:  send  a message,  or 

signal,  through  a noisy  channel  so  as  to  minimize  the  amount  the  signal 
is  distorted.  We  will  show  how  the  main  components  of  this  problem 
can  be  captured  in  a team  theory  formulation  (to  be  defined  below) 
with  a signaling  information  structure.  Figure  3.  1 summarizes  the 
connection  between  this  problem  and  the  Spence  problem. 

Before  going  cn  to  the  next  section,  we  need  to  describe  what 
a team  problem  is  and  how  it  relates  to  the  Spence  problem.  A decision 
and  control  problem  is  called  a "team"  problem  when  there  is  more 
than  one  decision  maker,  each  DM  has  different  information,  but  all 
DMs  have  the  same  objective  function  J.  The  Spence  problem  was 
not  a team  problem,  since  J ^ / J^t  but  was  an  example  of  a "nonzero- 
sum  (NZS)  game"  (called  "nonzero-sum"  because  Jj  + / 0).  An 

"optimal"  solution  in  the  Spence  problem  was  characterized  as  a Nash 
equilibrium,  as  defined  in  Section  II.  2.  On  the  other  hand,  a "team 
optimal"  strategy  pair  (vp  ^2^  defined  as  (where  J is  to  be 
minimized): 
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J(Vi,V2^  ^ admissible  (^1,^2) 

As  shown  byRadner  [7],  the  first  order  necessary  and  sufficient 
conditions  are  the  same  as  for  a Nash  equilibrium,  but  the  second 
order  convexity  conditions  are  not.  However,  the  second  order 
conditions  in  [7]  are  only  for  the  case  of  static  information  structure, 
and  cannot  be  carried  over  to  dynamic  information.  This  is  because 
is  a function  of  yj,  so  that  the  convexity  of  J(yj,  'Y2^^1^^  both 
yj  and  cannot  be  determined  until  y2  is  specified.  Therefore, 
"convexity"  of  an  objective  function  when  the  information  is  dynamic 
is  yet  to  be  defined. 

2.  Communication  System  as  Team  Problem 

The  major  problem  in  corrimunication  theory  is  to  send  information 
from  a source  through  a channel  to  a receiver  in  the  most  "reliable" 
way,  where  "reliability"  is  yet  to  be  defined.  Wyner  [14]  says  that, 
in  general,  there  are  two  limitations  on  the  reliabiUty.of -the' 
communication'system.  First  of  all,  the  channel  may  have  noise, 
such  as  static  in  a radio  channel.  The  second  limitation  is  what  Wyner 
calls  "source-channel  mismatch.  " For  example,  the  source  may  emit 
binary  symbols,  as  with  a computer,  whereas  the  channel  may  be  able 
to  accept  only  continuous  data,  as  with  a radio.  Also,  the  rate  at  which 
the  source  emits  data  nnay  be  different  from  the  rate  at  which  the 
channel  can  process  it.  To  combat  these  limitations,  an  encoder  is 


placed  between  the  source  and  channel,  and  a decoder  between  the 
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chauxnel  and  receiver.  Figure  3.  2 illustrates  these  ideas  for  the 
standard  model  of  a communication  system  (see  [8],  [14]).  First  of 

jjc 

all,  a (memoryless  ) source  emits  source  symbols,  or  messages, 
denoted  as  x,  at  a rate  p symbols/second,  which  go  into  the 
encoder  and  come  out  as  signals,  or  codewords,  u.  The  functional 
relationship  between  x and  u as  defined  by  the  encoder  is  denoted  as 
u = Next,  the  signals  u are  transmitted  through  a (memoryless) 

channel  that  processes  inputs  at  a rate  symbols/second,  and  may  be 
corrupted  by  noise  C . We  will  be  concerned  with  additive  noise,  so 
that  the  received  signal  y coming  out  of  the  channel  will  be  expressed 
as  y = u f e.  Lastly,  the  decoder  converts  y to  symbols  v,  and 
sends  them  to  the  receiver.  Since  the  goal  of  the  decoder  is  to  deter- 
mine which  message  x was  originally  sent,  we  will  also  write  v as 
X,  so  that  the  decoder  is,  in  some  sense,  the  inverse  of  the  encoder. 

As  with  the  encoder,  the  decoder  is  defined  by  some  function 
that  V = “ ^2^*^  + €)  = + € ).  We  immediately  see  how 

similar  this  looks  to  the  Spence  problem,  where  ^ noise 

corrupted  function  of  yj.  More  will  be  said  about  this  below. 

To  complete  the  description  of  the  channel,  p(ylu),  the  transi- 
tional probability  density  of  y given  u,  and  a cost  function  ^(u) 

2 

must  be  specified;  for  example,  0(u)  = u . To  complete  the  description 
of  the  source,  p(x),  the  probability  density  for  the  source  output,  and 

In  order  to  relate  this  problem  to  the  Spence  problem  later  on,  we 
will  use  notation  that  matches  the  notation  in  the  previous  chapter, 
not  the  notation  that  necessarily  occurs  in  the  information  theory 
literature. 

The  memoryless  assumption  means,  in  general,  that  current  behavior 
does  not  depend  on  the  past. 


FIG.  3.2  COMMUNICATION  SYSTEM 


L 


D(x,  v),  the  distortion  function,  must  be  specified;  for  example, 


2 

D(x,  v)  = (x  - v)  . Distortion  is  a measure  of  how  v differs  from  x, 
regardless  of  whether  the  signal  has  been  sent  through  a channel  or  not. 
An  example  of  where  nonzero  distortion  occurs  without  the  presence  of 
a channel  is  in  data  compression,  where  less  significant  information  is 
deleted  or  condensed  in  order  to  transmit  more  significant  information 
more  reliably. 

The  notion  of  reliable  transmission  of  information  can  now  be 
defined  more  precisely.  The  basic  problem  in  communication  theory 
is  to  find  an  encoder  and  decoder  so  as  to  minimize  average  distortion 
E[D(x,  v)]  subject  to 

E[0(u)]  i , (3.1) 

where  a is  some  fixed  constant.  Inequality  (3.  1)  is  a constraint  on  the 
amount  of  signal  power,  where  by  "signal"  we  mean  the  channel  input 
variable  u.  Since  minimizing  distortion  is  the  single  goal,  the  problem 
lends  itself  naturally  to  the  following  team  formulation: 

min  J = E[D(x,v)]  = E[D(x,  y_(v,  (x)+e ))  ] s.  t.  E[0(y,  (x))]  s a , 

distortion  signaling  power  constraint 

(3.2) 

that  is,  minimize  average  distortion  subject  to  a power  constraint  as 
the  encoder  and  decoder  are  varied.  Thus,  DMl  is  the  encoder  with 
strategy  = u,  and  DM2  is  the  decoder  with  strategy  ~ 

Since  v = ^2^“  + e)  = ‘Y2(‘Yi(*^)  + fWs  problem  exhibits  precisely 
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what  we  set  out  to  show,  namely,  signaling  with  noise,  since  is  a 

function  of  a noise-corrupted  yy  This  particular  notation  shows  how 

similar  this  problem  is  to  the  Spence  problem.  In  fact,  if 

D(x,  v)  = (x  - v)  , then  J = E[D(x,  v)]  is  precisely  3 from  the  Spence 

problem. 

Wyner  [14]  posed  the  communication  problem  in  language  which 

easily  transfers  to  the  team  problem  in  (3.  2),  although  he  does  not 

explicitly  mention  team  theory.  Witsenhausen  [12],  however,  recognized 

% 

that  the  communication  problem  could  be  formulated  as  a team  problem 
with  a dynamic  information  structure,  but  did  not  investigate  this  further. 

Whittle  and  Rudge  [10]  took  the  opposite  point  of  view.  They  started 

with  a team  problem  and  showed  that  it  could  be  interpreted  as  a i 

communication  problem.  Their  team  problem  was  a more  general  | 

version  of  (3.  2),  where  x,  u,  e,  etc.  represented  infinite  time  j 

sequences,  so  that  they  could  use  the  results  of  information  theory  to 
solve  for  the  optimal  value  of  (3.2). 

Now  that  the  communication  problem  has  been  reduced  to  a team 
problem,  several  questions  from  a decision  and  control  point  of  view 
arise.  First  of  all,  an  obvious  question  is:  what  is  the  team  optimal 
strategy  pair  (yj,  y^)  for  the  problem  in  (3.2)?  Once  this  pair  has  i 

been  determined,  a second  obvious  question  is:  what  is  the  value  of  the  ' 

>)t  ♦ jjt  1 

optimal  objective  J = J(yj,  y2)?  Our  immediate  response  might  be  to 

. i 

*He  called  it  a "nonclassical  stochastic  control  problem.  " | 


It  will  be  shown  later  why  the  assumption  of  infinite  sequences  is 
important  in  information  theory. 


‘3-37?!' 
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despair  of  ever  answering  these  questions  because  of  all  the  difficulties 
associated  with  team  problems  with  dynamic  information,  as  discussed 
in  Section  1.  In  fact,  Witsenhausen  [1 1 ] extensively  studied  a similar 
team  problem  without  answering  these  questions.  Fortunately,  since 
our  team  problem  was  motivated  by  a communication  system,  we  can 
take  the  same  approach  as  in  Whittle  and  Rudge,  and  use  the  results  of 
Shannon  information  theory  to  answer  some,  but  not  all,  of  these 
questions. 

If  "Yj  and  cannot  be  found,  or  if  they  are  very  complicated 
functions,  then  a next  question  to  ask  might  be  whether  there  are 
suboptimal  strategies  whose  objective  J does  not  differ  too  much  from 
J , but  which  are  easier  to  compute  than  yj  and  y^*  For  example, 
when  are  linear  strategies,  which  are  simple  to  express,  optimal,  and 
when  are  they  not  optimal? 

In  the  team  formulation  of  (3.  2),  y^  and  could  be  mappings 
between  scalar  variables.  However,  if  the  admissible  strategy  spaces 
for  yj  and  were  expanded  to  include  mappings  between  vectors. 
then  increasing  the  dimensions  of  x,  u,  e,  etc.  might  result  in  lower 
distortion  than  if  the  variables  were  restricted  to  being  just  scalars. 
Certainly,  by  increasing  the  strategy  space,  we  cannot  do  worse  and 
may,  in  fact,  do  better.  Therefore,  this  observation  leads  us  to  ask 
how  the  dimensions  of  the  variables  x,  u,  etc.  affect  the  solutions 
(■Yp  -Y2)  ^ mentioned  above,  in  information  theory,  these 

variables  represent  infinite  sequences,  so  that  they  can  be  thought  of 
as  infinite -dimensional  vectors.  Thus,  information  theory  might  be 
able  to  tell  us  something  about  the  affect  of  dimensionality  on  the 


solution. 
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To  summarize,  the  questions  we  want  to  answer  are: 

QUESTIONS; 

^ 9]e 

1.  What  are  the  team  optimal  strategies  ('Yj>Y2^^ 

Z.  What  is  the  value  of  the  optimal  objective  J = J(Vj>  ^2^^ 

3.  Are  there  suboptimal  strategies  that  are  easy  to  implement, 

* 

and  how  does  their  objective  J differ  from  J ? 

4.  How  do  the  dimensions  of  the  variables  x,  u,  etc.  affect  the 

^•e  ^ ajc 

solutions  ^ ^ 

Before  we  address  them,  we  momentarily  digress  from  our  team  theory 
point  of  view  to  defin  '^he  basic  concepts  and  results  from  Shannon 
information  theory.  (For  more  detail,  see  [1],  [2],  [8],  [14],  and  [6].) 

❖ 

3.  Shannon  Theory 

3.  1 Basic  Concepts 

Shannon  theory  provides  the  theoretical  foundation  for  communi- 
cation theory  by  establishing  an  upper  bound,  called  "channel  capacity" 
(C),  on  the  amount  of  information  that  can  be  transmitted  through  a 
channel.  It  also  provides  a quantitative  measure  of  information  that  can 
be  used  in  characterizing  not  only  the  channel  capacity,  but  also  the  rate 
at  which  the  source  produces  information,  called  the  "source  rate  (R).  " 

Those  familiar  with  information  theory  may  wish  to  skip  this  section, 
since  its  purpose  is  solely  to  educate  people,  such  as  economists  and 
control  theorists,  who  have  little  or  no  knowledge  of  information 
theory.  The  intent  is  not  to  shed  new  light  on  Shannon's  results,  but 
rather  to  define  terminology  and  concepts  for  later  use. 
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Intuitively,  if  R,  the  rate  at  which  the  source  produces  information, 
is  less  than  C,  the  maximum  rate  the  channel  can  process  inform- 
ation, we  would  expect  that  the  source  and  channel  could  be  joined  in 
some  way  to  produce  a communication  system  that  transmits  in- 
formation at  a rate  R.  This  is  exactly  what  Shannon's  Coding 
Theorem  says,  namely,  that  if  R ^ C,  then  there  exists  an  encoder 
and  decoder  joining  the  source-receiver  pair  to  the  channel  such  that 
information  can  be  transmitted  at  a rate  as  close  to  C as  desired 
with  arbitrarily  small  probability  of  error,  in  the  limit  as  the  length 
(or  duration)  of  the  encoded  messages  gets  sufficiently  large.  K 
R > C,  then  the  source  is  producing  information  faster  than  the  channel 
can  process  it,  so  that  a certain  amount  of  error  is  unavoidable.  These 
intuitive  ideas  will  be  made  more  precise  later  on  when  we  return  to 
the  Coding  Theorem  in  more  detail.  Before  we  define  what  is  meant  by 
"rates  of  information,  " we  must  define  the  concept  of  "information" 
first.  Shannon's  abstract  measure  of  information,  to  be  described 
next,  is  interesting,  but,  by  itself,  does  not  provide  any  new  results. 

Its  real  importance  lies  in  the  fact  that,  with  this  measure,  the 
important  Coding  Theorem  could  be  proved. 

The  randomness  inherent  in  the  messages  and  signals  of  a 
communication  system  implies  that  information  is  statistical,  so  that 
any  measure  of  it  must  involve  probabilities.  As  mentioned  in  the 
previous  section,  the  particular  probabilities  required  are  the  source 

sjt 

The  discussion  and  definitions  to  follow  will  all  be  for  the  case  of  a 
discrete  source  and  channel,  so  that  probabilities  instead  of  densities 
will  be  used.  The  information  measure  for  the  continuous  case  can  be 
similarly  defined,  but  is  more  complicated  to  interpret  and  so  will  be 
omitted  here. 
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output  and  channel  transition  probabilities.  Therefore,  these 

probabilities  will  play  a role  in  the  definitions  to  follow. 

Since  information  can  be  defined  in  a purely  statistical  sense,  the 

definitions  will  first  be  stated  in  terms  of  abstract  sets  of  events  and 

then  interpreted  in  terms  of  a source  and  channel.  To  simplify  matters, 

think  of  a source  generating  symbols  a^  from  a discrete,  finite  set  A, 

which  are  then  input  directly  into  a channel,  emerging  as  output  symbols 

b.  from  a discrete,  finite  set  B.  In  an  abstract  sense,  A and  B are 
J ’ 

just  random  variables  chareicterized  by  probabilities  {p(a.)}  and 
{p(bj)J,  respectively.  Then  we  have  the  following  definitions: 

1.  Information:  I(a^)  = log  l/p(a.)  = - log  p(a^) 

= amount  of  information  received  if  told  event  a.  has 

1 

occurred. 

Intuitively,  if  p(a.)  is  small,  then  a lot  of  information  is  received  if 
the  unlikely  event  a.  has  occurred.  If  the  log  is  in  base  2,  then  the 
unit  of  I(a^)  is  called  a "bit.  " If  it  is  in  base  e,  then  the  unit  is  a 
"nat.  " We  will  be  using  the  "bit"  notation  in  the  rest  of  this  chapter. 
The  particular  choice  of  "log"  comes  about  because  it  satisfies  certain 
desirable  axioms.  See  [8]  for  a detailed  discussion  of  these  axioms, 
and  [15]  for  alternatives  to  log  as  the  information  measure. 

2.  Entropy  (bits/symbol):  H(A)  = E[I(A)]  = 2^  p(ajl(a^) 

= average  amount  of  information  received  after  being  told 
what  the  source  ennitted 

= average  prior  uncertainty  regarding  what  the  source  will 
emit 
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average  number  of  "yes-no"  questions  to  be  answered 
to  determine  output 

rate  at  which  source  produces  information  subject  to 
no  distortion  (to  be  explained  later). 


3.  Conditional  Entropy:  H(A|b^)=  [I( A | b^ ) ] = 2.p{a^  | bj)I(a.  j b^.) 

= average  information  from  A given  observation  b^  . 

4.  Equivocation:  H(A[B)  = Ej^E^yj^[l(A|  b)]  = p(bj)H(A|bj) 

= average  information  from  A given  output  is  observed 
= average  uncertainty  of  what  source  emitted  after  ob- 
serving an  output  signal 

= average  amount  of  information  mi s s ing  in  the  received 
signal 

= average  amount  of  additional  information  that  must  be 
supplied  per  second  at  the  receiving  point  to  correct  the 
received  message. 

5.  Mutual  Information:  I(A,  B)  = H(A)  - H(A|  B) 


^ - t P(bj)  E P(ajb  ) log 

1 ^ 1 j •'  1 •'  ^ i'  j 


^ P(ajb  )p(b  ) 

E E p(a.,b.)  log  s — vr-4- 

Y i ^ ^ y * p(a.)  p(b.) 


E E p(a.,  b ) log  p(a.,  b )/p(a.)p(b.) 
i j J J J 
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= average  information  provided  by  observing  one  output 
= rate  of  transmission  of  information  through  the  channel 
= measure  of  statistical  dependence  between  A and  B 
(the  more  dependent  they  are,  the  more  information  we 
get  about  A from  observing  B). 

The  important  consequence  of  these  concepts  is  that  now  the  source 
rate  R and  channel  capacity  C can  be  expressed  as  solutions  to  a 
pair  of  optimization  problems  involving  mutual  information.  For  a 
general  channel  with  power  constraint  (3.  1),  capacity  can  be  written 
as  (using  the  notation  from  Figure  3.2) 


C = C{a)  = p sup  I(u,  y)  s.t.  E[0(u)]  s a , (3.3) 

p(u) 

where  the  supremum  is  taken  over  all  input  probabilities  satisfying  the 
constraint,  and  C{a)  is  in  units  of  bits/second.  As  mentioned  earlier, 
C(of)  is  defined  as  the  maximum  rate  that  information  (in  bits)  can  be 
sent  through  the  channel  essentially  error -free  ("essentially"  in  the 
sense  that  the  probability  of  error  can  be  made  arbitrarily  small). 

Similarly,  for  a general  source,  the  rate  can  be  expressed  as 

R = R(p)  = inf  I(x,  v)  s.t.  E[D(x,  v)]  s p , (3.4) 

p(v/x) 

where 

E[D(x,v)]  s p 

is  called  the  fidelity  criterion  and  R(p)  is  called  the  rate  distortion 
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function,  with  p a nonnegative  constant.  At  first  it  might  seem 
strange  to  talk  about  minimizing  a rate,  since  we  always  talk  about 
m.aximizing  the  transmission  rate.  But  as  Berger  [2]  points  out,  with 
rate  distortion  functions,  it  is  the  source-receiver  pair  that  is  given, 
not  the  channel.  What  is  being  minimized  is,  in  some  sense,  the 
time  and  effort  it  takes  to  code  a message.  Thus,  as  proved  by 
Shannon,  R(P)  can  be  interpreted  as  the  minimum  number  of  binary 
digits  per  second  required  to  represent  a message,  subject  to  dis- 
tortion no  more  than  p.  If  the  sour ce -receiver  pair  is  to  be  linked  to 
a channel,  then  R(p)  can  also  be  interpreted  as  the  minimum  capacity 
that  channel  must  have.  R(p)  is  a decreasing  function  of  p,  since  a 
higher  distortion  allowed  means  fewer  binary  digits  needed  to  represent 
the  message.  This  is  easy  to  see  mathematically,  since  larger  p 
means  expanding  the  set  of  admissible  p(v[w)  over  which  the  infimum 
is  taken.  Since  entropy  is  the  rate  at  which  information  is  generated 
subject  to  no  distortion,  the  rate  distortion  function  is  just  ^ generali- 
zation of  the  concept  of  entropy.  For  discrete  sources,  R(0)  - H(w). 

For  continuous  sources,  such  as  Gaussian,  R(0)  = ® , since  a real 
number  would  require  an  infinite  number  of  bits  to  represent  it 
perfectly. 

The  following  are  some  examples  of  C(o')  and  R(p)  from 
Wyner  [14],  derived  directly  from  the  definitions  (3.  3)  and  (3.4), 
respectively. 

EXAMPLE  3.  1:  Suppose  we  have  a binary  source  such  that 

Pr(X  = Oj  = Pr[X  = 1}  = 1/2,  and  the  distortion  function  D(x,  v)  = 0 
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if  X = V and  D(x,  v)  =1  if  x / v (this  implies  that  E[D(x,  v)]  = P^, 
where  P is  the  probability  of  error).  Then 

{p^(i-h(p)), 

(3.5) 

0 , 

where  h(p)  = -p  log2  P - (1  “ P)  ^^82  ~ 0 < p s 1/2  and 

h(0)  = linip  Q h(P)  = 0.  From  the  definition  of  entropy  and  h(p),  we 
see  that  h(l/2)  = H(x)  = 1,  and  R(0)  = (see  Figure  3.  3).  The 

reason  R(p)  = 0 for  p s l/2  is  because  distortion  p = 1/2  can  be 
attained  by  always  guessing  v = x = 0.  That  is,  the  decoder  output  is 
a stream  of  zeros,  regardless  of  the  input,  so  that  no  information  is 
being  produced. 

EXAMPLE  3.2:  Consider  now  a Gaussian  source  where  x has 

2 

a Gaussian  density  function  with  zero  mean  and  variance  a , and  the 

2 

distortion  function  is  D(x,  v)  = (x  - v)  . Then 

/ p 2 

1 T ’ 0 ^ P ^ 

R(p)  = \ (3.6) 

I 0 , p s 

2 

(See  Figure  3.4.  ) The  reason  R(P)  = 0 for  p a ct  is  because 
p = = variance  (x)  can  be  attained  by  guessing  that  x is  the 

prior  mean;  i.  e.  , v = x = 0 for  this  example.  Again,  this  means 
that  the  decoder  output  is  all  zeros,  and  no  information  is  being 
produced. 


FIG.  3.3  R(/9)F0R  BINARY 
SOURCE 


FIG.  3.4  RU3)  FOR  GAUSSIAN 
SOURCE 
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EXAMPLE  3.  3:  Consider  a Gaussian  channel  where  noise  e 

2 

has  a Gaussian  density  with  zero  mean  and  variance  a , and  the  cost 
• 2 

function  is  0(u)  = u . Then 


2 

where  o/o  can  be  considered  the  signal-to-noise  ratio.  Figure  3.5 
shows  that  C is  an  increasing  function  of  aj  that  is,  capacity  can  be 
increased  if  more  channel  input  power  is  allowed. 

Although  R(p)  and  C(a)  were  defined  in  terms  of  single  input 
and  output  symbols,  actual  coding  does  not,  except  in  rare  circum- 
stances, involve  immediately  sending  each  source  symbol  through  the 

channel,  even  if  0 - O ■ In  order  to  combat  noise  limitations  and 

’ ^s  ^c 

source-channel  mismatch  (such  as  when  / Pq)’  encoder  waits 

for  many  source  symbols  and  then  codes  them  altogether.  A wider 

range  of  codes  is  then  available  to  the  encoder,  so  that  cleverer  codes 

can  be  constructed.  Similarly,  the  decoder  waits  for  many  channel 

outputs  before  it  decodes.  For  example,  if  p^  / p^,  then  the  source 

and  channel  are  not  synchronously  compatible.  In  order  to  match  them, 

let  the  encoder  wait  T seconds  until  n = p T symbols  have  been 

s 

emitted.  In  this  time,  the  channel  can  process  N = p^T  symbols. 

Thus,  let  X be  an  n-vector  and  u an  N-vectorj  this  is  called  block 
coding.  The  new  vector  source  is  called  the  n-th  extension  of  the 
original  source,  and  the  N-th  extension  of  the  channel  can  be  similarly 


Block  coding  is  used  not  only  for  synchronization,  but  also  to  combat 
noise. 


t 


r 
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defined.  The  "new"  source  symbol  rate  is  now  p^/n  (in  units  of  n- 
vectors  per  second),  which  equals  the  "new"  channel  rate  P^,/N  (N- 
vector s/second).  Assuming  that  the  vector  components  are  independent, 
we  define 


p(x) 

n 

n 

i=i 

p(xj 

p(y|u)  = 

N 

n 

i=i 

p(Yihi) 

Also  D the  distortion  of  the  n-th  extension  of  the  source,  is 
’ n’ 


1 

D = — D(x.,v.) 

n n ' i’  1 

1=  1 


(3.8) 


and  , the  cost  of  the  N-th  extension  of  the  channel,  is 

1=  1 

As  previously  mentioned,  block  coding  arises  in  consideration  of 
actual  coding  techniques  to  minimize  distortion  and  increase  reliability 
in  a communication  system.  One  naive  approach  to  encoding  might  be 
to  just  repeat  each  scalar  source  symbol  many  times  through  the 
channel.  In  other  words,  for  each  source  symbol  x,  construct  a vector 
u whose  components  are  x's.  As  the  number  of  repetitions  increases, 
the  dimension  N of  u increases  for  a fixed  dimension  n of  x.  In 
the  limit,  this  scheme  will  drive  the  probability  of  error  to  zero  [4], 
but  will  pay  a price.  As  N increases,  the  channel  is  taking  longer  and 
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longer  to  transmit  the  same  amount  of  information  being  emitted  from 
the  source.  Therefore,  the  source  rate  is  decreasing  relative  to  the 
channel  rate.  This  means  that  the  channel  is  being  used  inefficiently, 
since  the  rate  at  which  it  can  handle  inputs  is  much  larger  than  the 
rate  at  which  information  is  being  produced.  This  tradeoff  of  rate  and 
reliability  was  thought  to  be  the  best  one  could  do,  until  Shannon  came 
along.  His  theorem  says  that  one  can  do  much  better;  that  is,  for  any 
fixed  rate  R,  the  probability  of  error  can  be  driven  to  zero  in  the 
limit  (and  thus  minimize  distortion)  by  simultaneously  increasing  N 
and  n and  choosing  clever  encoders  and  decoders.  This  is  the  really 
crucial  point  of  Shannon's  theorem.  Thus,  both  methods  of  block 
coding,  i.  e.  , repeating  and  the  coding  scheme  referred  to  in  Shannon 
theorem,  are  limiting  results.  As  N and/or  n increases  to  «>,  so 
does  the  delay  T,  the  time  it  takes  to  emit  one  n-vector  from  the 
source  or  transmit  one  N-vector  through  the  channel.  In  the  case  of 
repeating,  this  delay  T is  incurred  every  time  a source  symbol  is 
emitted.  However,  in  Shannon's  theorem,  the  delay  is  incurred  just 
once,  at  the  beginning,  when  the  first  n-vector  is  emitted.  Then  the 
source  and  channel  are  matched  synchronously,  so  that  while  a source 
vector  is  being  produced,  the  previous  source  vector  is  simultaneously 
being  sent  through  the  channel.  It  takes  T seconds  to  accomplish 
both  these  tasks,  so  that  no  more  delay  is  incurred.  In  practice,  the 
initial  delay  is  not  significant,  relative  to  the  entire  time  the 
comm'onication  system  is  in  operation. 

We  are  now  ready  to  state  the  major  result  of  classical  in- 
formation theory. 
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Shannon's  Coding  Theorem:  Suppose  a source  and  channel  with 
a and  p specified  are  given.  If 


R(p)  s C(a) 


(3.  10) 


then  for  arbitrary  6j>0  and  > 0,  there  exists  a T sufficiently 
large  and  an  encoder -decoder  pair  such  that  average  cost  satisfies 
E[0]^  a+  and  average  distortion  satisfies  E[D]sp  + 


Proof;  See  [1]  and  [4]. 

Thus,  R(P)  and  C(a)  are  not  exactly  the  source  rate  and 
channel  capacity,  respectively,  but  are  approximations  which  become 
more  exact  as  the  delay  T becomes  large. 


Converse  to  the  Coding  Theorem:  If 

R(p)>C(o')  , (3.11) 

then  there  does  not  exist  an  encoder -decoder  pair  such  that 
E[0]  = a and  E[D]  = p. 

Proof:  See  [1],  [5],  and  [14]. 

In  other  words,  (3.  10)  is  the  best  we  can  do,  for  if  (3.  11)  holds, 
then  even  in  the  limit,  average  distortion  p cannot  be  attained. 

Another  way  of  stating  the  converse  is  that  if  E[D]  - P can  be  attained 
(approximately)  at  a cost  E[0]  = a , then  a and  p must  satisfy  (3.  10). 
Then  p*,  the  solution  to  (3.  10)  with  equality  for  given  a,  is  a lower 
bound  (called  the  Shannon  bound)  for  attainable  distortions.  However, 
the  really  imporUnt  result  is  the  Coding  Theorem  itself,  which  states 

that  p"^  is  actually  attainable  (in  the  limit  as  T increases). 
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3. 2 Discussion 

Several  points  can  now  be  made  about  these  results.  First  of  all, 
as  surprising  as  Shannon's  theorem  is,  it  has  one  major  drawback, 
it  is  an  existence  theorem  and,  thus,  does  not  provide  a technique  for 
actually  constructing  the  encoder  and  decoder.  The  solutions  to  the 
optimization  problems  (3.4)  and  (3.3)  for  R(p)  and  C(of),  respectively, 
are  not  coders,  but  optimal  probability  density  functions.  They  have 
limited  usefulness  in  finding  the  optimal  coders.  For  example,  if  an 
arbitrary  coding  scheme  is  constructed,  its  densities  can  be  computed 
and  compared  against  the  solutions  to  (3.  3)  and  (3.4)  to  see  if  the 
scheme  is  optimal.  However,  this  seems  to  be  about  as  far  as  one  can 
go  using  only  the  Coding  Theorem. 

When  we  consider  the  rate  distortion  function  together  with  the 
channel  to  find  the  minimum  distortion,  we  can  re-express  R(p)  as 
(see  [2]) 

p*  = inf  E[D(x,  v)]  s.t.  p I(x,  v)  s C(o')  . (3.12) 

p(vlx) 

This  formulation  is  appealing  because  it  seems  more  natural  to  mini- 
mize  distortion  rather  than  rate.  Suppose  p (v|x)  is  the  probability 
density  that  attains  p , and  I (x,  v)  is  the  corresponding  mutual 
inforn  -ion  evaluated  with  p (v]x).  Then 

R(P*)  = Pgl*(x,v)  = C(a)  . (3.13) 

This  illustrates  the  close  connection  between  the  Coding  Theorem  and 
minimizing  distortion. 
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4.  Optimal  Payoff  and  Strategies 

Now  that  the  fundamentals  of  Shannon  theory  have  been  established, 
we  can  return  to  the  five  team  theory  questions  posed  in  Section  2. 

This  section  will  address  the  first  two  questions.  First  of  all,  since 
the  Coding  Theorem  holds  only  when  infinite  sequences  of  source  and 
channel  symbols  are  allowed,  we  must  modify  the  team  formulation  to 
account  for  vectors  x,  u,  etc.  of  arbitrarily  large  dimension.  Then 
the  team  problem  becomes 


min  J = E 

Yi»^2 


[lim  D 
1 n 

s.  t. 

E 

»N 

[n-*“ 

LN— * 

s a 


(3.  14) 


where  "y,  and  are  now  mappings  between  infinite  vectors.  Since 
the  objective  is  to  minimize  distortion,  the  optimal  value  of  J,  call 
it  J , is  just  p , which  satisfies 


R(p  ) = C{a)  , 


since  R(p)  is  a decreasing  function  of  p bounded  from  above  by 
C(q').  Therefore,  Shannon's  theorem  immediately  gives  us  the  optimal 
payoff  for  the  team  problem  (3.  14),  and  so  answers  Question  2 in  the 
limit  as  n,  N -*  « (or,  equivalently,  T -*  ®). 

As  a graphical  interpretation  of  J = P , recall  Example  3.  1,  a 
discrete,  binary  source.  Suppose  we  turn  the  problem  around  and  ask 
the  following  question: 


Wyner  p4]  formulated  the  problem  this  way,  but  not  in  the  context  of 
team  theory. 
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Question;  If  we  want  to  send  data  through  a channel  of  capacity 
C (for  fixed  a)  with  distortion  no  more  than  p, 
what  is  the  maximum  possible  rate  at  the  source. 

Answer:  Set  R(p)  = C and  solve  for  Pg(P). 

From  (3.  5),  this  gives 


C 

t:W) 


(3.  15) 


A plot  of  p vs.  p is  shown  in  Figure  3.  6.  It  shows  that  when  dis- 
tortion  is  allowed,  the  data  rate  will  be  faster  than  the  channel 
capacity,  a somewhat  nonintuitive  result,  since  capacity  is  thought  of 
as  a maximum  rate. 

There  is  another  way  of  getting  this  curve  which  illustrates  the 

idea  of  (3.  10).  First  plot  R(p)  as  a function  of  p for  different  values 

of  Pg  (see  the  solid  curves  in  Figure  3.  7).  Then  draw  the  line  C. 

Shannon's  result  that  R(p)  ^ C defines  the  forbidden  and  attainable 

regions.  The  best  bound,  namely,  the  minimum  attainable  p for  a 

given  p , is  indicated  by  the  circles,  the  points  at  which  R(p)  = C. 
s 

Matching  these  optimal  P's  with  their  corresponding  p^'s,  we  get 
the  dotted  curve,  which  is  precisely  the  same  curve  as  in  Figure  3.  6. 
Therefore,  given  the  entire  communication  system,  the  points  on  the 
curve  Pg(P)  attained  in  three  steps: 

1.  Fix  0 and  solve  for  C(cr)  = C in  (3.3). 

2.  Fix  p^  and  solve  for  R(p)  in  (3.4). 

3.  Set  R(p)  = C and  solve  for  p (p). 

s 


'Stsmmrnm 


FIG.  3.6  p^iP)  FOR  BINARY  SOURCE 


FIG.  3.7  OTHER  DERIVATION  OF  p^iP)  FOR  BINARY  SOURCE 
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s 

I 

If  we  solve  (3.  12)  instead  of  (3.  4),  then  we  get  (3.  13)  and  can  skip  step  ^ 

i 

3 entirely.  I 

This  procedure  for  finding  the  minimum  distortion  can  be  thought 

of  as  a kind  of  "separation  principle.  " The  two -person  team  problem 

with  dynamic  information  is  replaced  by  two  one-person  control 

problems  ((3.  3)  and  (3.4))  with  static  information.  Whittle  and  Rudge 

recognized  this  when  they  said:  "Control  and  communication  are  both 

required  but  the  controls  operate  on  separate  parts  of  the  system,  so 

that  joint  control  is  not  required"  (see  [lO],  p.  366).  However,  as 

Witsenhausen  [12]  noted,  this  procedure  is  still  not  general  enough  to 

be  applied  to  other  team  problems. 

For  fixed  p^,  if  the  team  problem  could  be  solved,  then  it  would 

yield  P(p  ).  Inverting  this  function  yields  p (p),  precisely  the  same 
s s 

curve  as  in  Figure  3.  6 and  the  dotted  curve  in  Figure  3.  7.  Thus,  the 
condition  R(p)  s C(a)  is  buried  inside  the  team  formulation.  It  is  not 
yet  clear  whether  this  inequality  can  be  derived  from  the  viewpoint  of 
team  theory  alone.  However,  it  can  be  derived  from  team  theory 
together  with  rate  distortion  theory,  which  yields  the  solid  curves  in 
Figure  3.  7.  For  example,  in  Figure  3.  7,  suppose  that  the  rate  curves 
R(p)  and  p (P)  are  shown,  but  the  line  C{a)  is  not.  We  will  derive 
this  line  in  another  way,  without  reference  to  R(p)sC(o).  Consider 
the  points  where  a vertical  line  drawn  throvigh  p^  intersects 
R(P;p  .)  (let  P = R(p,;p  ),  i=  1,2,3).  The  interpretation  of  B.  from 
p (P)  is  that,  given  p .,  it  is  the  minimum  attainable  distortion.  Thus, 
for  p such  that  R(p;p^.)^P^,  p is  attainable,  and  for  R(p;p^.)>  P., 
p is  not  attainable.  If  we  do  this  for  all  i,  then  we  will  find  that  the 
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P.'s  are  all  equal;  that  is,  the  points  of  intersection  lie  on  a horizontal 
line.  Let  Pj  = P^  = • • • = C.  Then  we  immediately  have  the 
condition  R(p)  ^ C defining  the  attainable  region. 

Since  the  solutions  of  (3.3)  and  (3.4)  yield  only  the  optimal 
probabilities  p(u)  and  p(v|x),  the  optimal  encoder  and  decoder  are 
still  not  known.  In  fact,  C(q’)  and  R(p)  are  computed  on  a "per- 
symbol  basis,  " that  is,  with  regard  to  a single  input-output  pair, 
whereas  the  Coding  Theorem  is  a statement  about  transmission  of 
information  when  there  are  infinite  sequences.  Thus,  in  the  language 
of  decision  and  control  theory,  Shannon's  theorem  provides  the  optimal 
payoff  (in  the  limit  as  T -•  ®)  but  not  the  optimal  strategies.  It  is 
purely  an  existence  theorem.  Its  nonconstructive  nature  has 
frustrated  information  theorists  to  this  day.  Therefore,  Question  1 
cannot,  in  general,  be  answered  by  Shannon  theory. 

5.  Real-Time  Information  Theory 

5.  1 Introduction 

In  this  section,  the  problem  of  suboptimal  strategies  (Question  3) 
will  be  raised  in  the  context  of  a new  approach  toward  solving 
communication  problems.  In  the  previous  two  approaches  discussed  in 
Section  3.2--simple  repeating  and  the  cleverer  coding  scheme  whose 
existence  is  proved  by  Shannon's  theorem- -it  was  assumed  that 
sequences  could  be  infinitely  long,  and  thus  incur  an  infinite  delay.  If 
the  dimension  of  x is  large,  then  the  encoder  must  wait  for  the 
entire  vector  x before  it  starts  to  code.  If  the  dimension  of  u is 
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large,  the  decoder  must  wait  for  the  entire  vector  u to  go  through 
the  channel,  symbol  by  symbol.  A third  approach,  which  is  the  one  to 
be  discussed  in  the  rest  of  this  chapter,  is  what  we  call  "real-time 
information  theory.  " In  this  approach,  both  N and  n are  fixed. 

That  is,  we  consider  block  codes  with  a fixed  block  length.  This 
situation  might  occur  if  the  receiver  is  another  DM  who  must  make 
decisions  in  real  time,  i.  e.  , without  arbitrary  delay.  The  team 
problem  (3.  14)  now  becomes 

min  J = ^ ^ ’ (3.16) 

where  and  mappings  between  finite  vectors.  With  this 

extra  restriction  of  fixed  length,  the  optimal  encoder  and  decoder  in 
{3.  16)  may  not  attain  the  Shannon  bound  in  (3.  14).  That  is,  they  are 
suboptimal  in  the  infinite  delay  problem.  However,  the  formulation  in 
(3.  16)  is  actually  closer  to  traditional  team  theory,  which  does  not 
deal  with  infinite  or  arbitrary  delays.  This  assumption  of  fixed 
dimensions  also  brings  us  closer  to  the  Spence  problem,  which  can 
now  be  looked  upon  as  a NZS  version  of  real-time  information  theory 
where  n = N = 1.  Since  dimensionality  is  at  issue,  Question  4 will 
also  be  answered,  which  completes  our  list  of  team  theory  questions. 

The  Shannon  bound  is  the  best  we  can  do  if  n and  N are 
allowed  to  become  arbitrarily  large;  that  is,  the  admissible  strategy 
space  contains  mappings  between  vectors  of  arbitrary  lengths.  The 
mappings  between  vectors  of  fixed  length  n and  N constitute  a subset 
of  this  space.  Since  its  strategy  space  is  more  restricted  than  in  the 
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Shannon  problem,  the  real-time  team  problem  for  fixed  n and  N 
cannot  have  a distortion  less  than  p*.  Thus,  p*  is  a lower  bound  for 
the  real-time  problem.  Witsenhausen  [12]  also  pointed  this  out,  and 
noted  that  the  bound  may  be  quite  loose.  He  refers  to  a paper  by  Ziv 
and  Zakai  [15]  that  proposes  a way  to  find  tighter  bounds  by  replacing 
"log"  by  some  other  convex  function  in  the  definition  of  an  information 
measure.  However,  these  bounds  are,  in  general,  difficult  to 
compute. 

In  the  next  section,  we  will  investigate  a particular  example  of  a 
communication  system  and  will  show  under  what  circumstances  linear 
strategies  for  fixed  n and  N attain  the  Shannon  bound.  When  this 
happens,  the  strategies  are,  therefore,  team  optimal.  When  this  does 
not  happen,  the  performance  of  the  suboptimal  linear  strategies  can  be 

’r'  * 

compared  to  p . If  the  performance  is  close  to  p , then  the  easy-to- 
implement  linear  strategies  might  be  desirable. 

5, 2 Linear  vs.  Nonlinear  Strategies 

In  order  to  provide  a basis  for  comparison  of  optimal  and  sub- 

optimal  strategies,  we  will  assume  that,  for  all  examples  discussed  in 

this  section,  the  communication  system  in  question  has  a Gaussian 

source  and  channel,  as  described  in  Examples  3.2  and  3.  3 but  with 

♦ 

variance  (x)  = 1.  Then  for  fixed  a,  the  minimum  distortion  p , 
derived  from  equating  R(p)  from  (3.  6)  and  C(o)  from.  (3.  7),  is 
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where 


Then  and  0j^  from  (3.8)  and  (3.9)  become 


D = 
n 


1 

1 T 

n 


- 'i' 

1=  1 


'N 


1 N , 

= s 

1=  1 


(3.  18) 


(3.  19) 


(3.20) 


First  consider  the  special  case  of  n = N - 1.  Then  the  source  and 
channel  are  matched  synchronously,  so  that  P^.  = Pg  » k = 1.  The 
team  formulation  in  (3.  16)  reduces  to  the  simple  form 

min  J = ^[(^2  “ x)^]  s.t.  E yj  ^ a . (3.  21) 

YpV2 

Since  J is  the  same  as  from  the  Spence  problem,  and  the 
constraint  does  not  depend  on  ^2*  and  the  first  order  conditions  for 
the  unconstrained  team  optimal  are  the  same  as  the  first  order 
conditions  for  a Nash  equilibrium  [7],  then  the  optimal  decoder  is  the 
same  as  in  the  Spence  problem,  namely, 

X = V = ’ (3.22) 

We  cannot  evaluate  this  conditional  mean  until  we  specify  Vj,  since 
y = ■Yj(x)  + e.  Suppose  we  let  yj  be  linear;  that  is, 

u = yj(x)  = ax  , (3. 23) 
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where  "a"  is  a scalar.  Since  x and  C are  Gaussian  random 
variables,  also  linear,  and  (3.22)  becomes 


X = V = ^^(y) 


2'^'  " 2^  2 y » 

a +a 


(3.24) 


and  the  constraint  in  (3.  21)  becomes 


2 

a £ or 


For  a =^cT,  the  constraint  is  satisfied  and 


/ a 1 

( 2 ' ^ 

) * * — I ' 

\ cv+a 

/ a+a  J 

(3.  25) 


Q.+CT 


-2  = P (1)^ 


(3.26) 


This  linear  scheme  attains  the  Shannon  bounvx,  so  that  we  immediately 
have  the  solution  to  the  team  problem  in  (3.  21).  Therefore,  for  the 
special  case  of  N = n = 1,  the  linear  strategies 


■yj(x)  = X 

yF 


vz(y)  = 


y = X 


(3.  27a) 
(3.  27b) 


or  + a 


are  optimal.  This  result  is  at  first  very  surprising,  because,  as 
mentioned  earlier,  Witsenhausen  [11]  showed  that  for  a similar  team 
problem  with  signaling,  the  optimal  linear  solution  was  not  the  team 
optimal.  However,  the  result  (3.27)  was  also  noted  by  Witsenhausen  in 
a later  paper  [12],  by  Gallager  [5],  and  by  Whittle  andRudge  [10]. 


For  other  cases  of  fixed  n and  N besides  n = N = 1,  linear 
strategies  could  again  be  tried  and  compared  against  p (k).  To  make 
the  source  and  channel  synchronously  compatible,  choose  n and  N 
such  that 


N 

n 


(3.28) 


as  described  in  the  discussion  of  block  coding  in  Section  3.  1.  The 
communication  system  now  considered  will  involve  an  n-dimensional 
memoryless  Gaussian  source  x with  zero  mean  and  covariance  1 

n 

(n-dimensional  identity  matrix),  and  an  N-dimensional  additive 

2 

Gaussian  channel  whose  noise  € has  zero  mean  and  covariance 

N 

and  is  independent  from  x.  For  the  encoder,  a linear  strategy  means 


u = Hx  , (3.29) 

where  H is  an  N X n matrix.  It  will  be  assumed  to  be  of  maximal 
rank,  since  this  is  required  in  the  proof  of  Theorem  3.  1 below.  This 
assumption  has  the  interpretation  in  equation  (3.  29)  of  requiring  the 
components  of  u to  be  uncorrelated.  Since  we  know  from  Shannon 
theory  that  the  Shannon  bound  is  attained  when  the  inputs  u to  the 
channel  are  uncorrelated  (see  [5]),  H having  maximal  rank  is  a 
reasonable  assumption. 

From  (3.22),  v can  also  be  expressed  as  a function  of  H.  The 
version  of  the  team  problem  in  (3.  16)  with  and  0^^  as  in  (3.  19) 

and  (3.  20),  respectively,  now  becomes: 
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min  J 

H€Jr 


tr(x-v(H))(x-v(H))'^ 


s.  t. 


1 T T 

i tr  Hxx  H 


Scy  , 


(3.30) 


where  "tr"  stands  for  trace  and  superscript  "T"  for  transpose,  and 
Jtf  is  the  set  of  N X n matrices  of  maximal  rank. 


THEOREM  3.  1.  Let  H be  the  optimal  solution  to  (3.  30).  Then 
for  k s 1 


J(H*;k) 


2 

a 

O'  k + CT^ 


Proof.  See  Appendix  UI-A, 


(3.31) 


In  the  proof  of  Theorem  3.  1,  H is  derived  in  terrr  ■:  of  its 
eigenvalues,  not  the  matrix  itself.  However,  as  will  now  be  shown, 
a particular  H , with  a simple  interpretation,  can  be  found.  For 
k s 1,  that  is,  the  channel  dimension  is  greater  than  or  equal  to  the 
source  dimension,  a particular  linear  encoder  is  the  one  which 
corresponds  to  repeating;  that  is 


Proof.  See  Appendix  lU-A. 

Therefore,  for  k i 1,  repeating  is  as  good  as  the  best  linear 
encoder. 

An  immediate  consequence  of  Theorem  3.  1 is 

« 

j COROLLARY  3.2.  For  the  communication  system  described  in 

Section  5.  1,  linear  encoders  and  decoders  are  optimal  if  the  source 

* sjc 

dimension  n and  channel  dimension  N are  equal. 

^ — - 

Whittle  and  Rudge  [10]  prove  a more  general  result  for  the  case  of 
channels  with  memory. 


) 


Proof.  If  n = N,  then  k = 1 and  J(H*;  k=  1)  = P*(l). 

_ 

Figure  3.8  shows  how  J(H  ) deviates  from  p as  k increases.  Q.  E. 

Since  p is  calculated  assuming  infinite  sequences,  but  J(H  ) 
is  not,  the  converse  of  Corollary  3.  2 cannot  automatically  be  asserted. 
It  may  be  true  that  if  the  dimensions  n and  N are  fixed,  linear 
strategies  are  the  best  we  can  do.  However,  in  Section  5.  3 counter- 
examples for  k > 1 and  k < 1 will  be  described,  where  certain 
nonlinear  coders  give  lower  distortion  than  the  best  linear  ones. 

Before  presenting  the  counterexamples,  we  first  give  the 

heuristic  interpretation  as  to  vdiy  linear  is  best  for  N = n but  not 

necessarily  for  N j/  n that  was  first  proposed  by  Shannon  [9]  and  later 

by  Wozencraft  and  Jacobs  [13].  Consider  Figure  3.9  for  n = N = 2 

and  Figure  3.  10  for  n = I,  N = 2.  Figure  3,9  illustrates  the  linear 

case.  The  idea  here  is  that  a linear  transformation  maps  the  entire 
2 2 

space  of  x's  (R  ) to  the  entire  space  of  u's  (R  );  that  is,  it  fills  the 
u-space.  To  understand  the  significance  of  this,  we  mast  compare  it 
with  Figure  3.  10.  First  we  perform  a transformation  on  the  Gaussian 
random  variable  x so  that  it  falls  within  a finite  interval.  This 
simplifies  the  explanation  and  is  an  important  step  in  one  of  the 
counterexamples.  Now,  Corollary  3.  1 says  that  the  best  linear 
transformation  on  x is  as  good  as  just  repeating  x twice,  which 
implies  that  the  optimal  linear  coder  maps  the  finite  x interval  to  the 
diagonal  Uj  = u^  = x in  the  u-space.  However,  this  does  not  take 
advantage  of  the  higher  dimensionality  of  u;  that  is,  it  does  not 
"fill  the  space.  " A transformation  that  results  in  a curve  that  fills  the 
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space  more  than  the  diagonal  is  better  than  linear,  as  illustrated  by 
the  twisting  curve  in  Figure  3.  10.  * Now,  when  the  signal  (as  repre- 
sented by  the  curve)  goes  through  the  channel,  it  is  corrupted  by  noise. 
The  advantage  of  the  longer  curve  is  that  we  can  pack  in  more  little 
"noise  balls,  " assuming  that  the  variance  of  the  noise  is  very 
small  in  order  to  prevent  accidentally  jumping  to  the  wrong  part  of  the 
curve  when  the  noise  is  added.  In  fact,  Shannon  points  out  that 
there  is  a threshold  effect  where  the  increased  benefits  by  extending 
the  curve  are  outweighed  by  the  greater  chance  of  committing  a large 
error.  Now,  define  the  "stretch  factor"  S (see  [13])  as: 

_ change  in  length  along  carve 
change  in  x 

If  S is  constant  all  along  the  curve  and  the  noise  is  small,  then, 
locally  within  the  balls,  the  curve  looks  linear.  If  we  straighten  out 
the  curve  and  compress  it  to  fit  in  the  original  interval  in  x,  then  we 
have  also  compressed  the  noise  balls.  The  net  effect  is  that  we  have 
reduced  the  noise  for  the  whole  system,  so  that  we  get  a lower  dis- 
tortion  than  linear . 

5. 3 Counterexamples 

We  now  describe  the  counterexamples  that  show  that  linear 
strategies  are  not  necessarily  optimal  when  n and  N are  fixed,  and 

Shannon  [9]  calls  this  idea  the  "snake-in-the-box.  " 

Although  Gaussian  noise  extends  beyond  the  boundaries  of  the  noise 
balls,  almost  all  of  the  probability  density  falls  within  a ball  of  radius 
3ct  . Thus,  packing  in  balls  captures  the  conceptual  idea. 

^ ^ 4^ 

In  [13]  this  is  called  "twisted  modulation.  " 
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k / 1.  These  examples  utilize  the  "stretched  curve"  idea  to  construct 
nonlinear  encoders  and  decoders  that  are  better  than  linear.  For 
simplicity,  we  assume  that  the  random  variable  are  normalized  so 
that  0=1. 


COUNTEREXAMPLE  3.1.  k>l. 

2 

Let  n = 1 and  N = 2,  so  that  x ~ N(0,lj)  and  e ~ N(0,a  I2). 
Div'ide  x into  four  regions  such  that  the  probability  density  of  each 
region  is  1/4  (see  Figure  3.  H);  that  is, 

2 
5 

A 

I p(x)  dx 

Jn 


F I 1 

I p(x)  dx  = ^ for  p(x)  = 

■A 


X 

2 


y[2u 


=>  A fti . 67. 


Since  N = 2,  the  encoder  must  take  x to  some  two-dimensional 
vector 


‘1 


u = 


U^ 


The  particular  encoder  used  in  this  example  is  to  let  Uj  represent  the 
region  x is  from,  and  let  u^  be  a linear  transformation  of  x in  a 
stretched  out  version  of  this  region.  Figure  3.  H gives  a graphical 
interpretation  of  this  scheme.  More  precisely,  let  r(x)  = region 
number  of  x.  Then  it  can  be  verified  that  U2  can  be  expressed  as 

“2  = ® (x  +5-2  r(x))  , (3.33) 

See  Appendix  III-B  for  details. 
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(see  Figure  3.  11  for  illustration  of  B)  and  the  stretch  factor  S is 
constant  for  all  x within  a given  region: 


for  fixed  r. 


For  algebraic  simplicity,  let 


“2  = 


“2  + '2  • 


Now,  'ij  cr,  where  c is  a constant  chosen  to  satisfy  the  power 
constraint,  so  that 


^1  ” *^1  ^ ^1  ” cr  + Cj 


2 

=*  p(yil*’)  ~ N(cr,  a ) 


Let  f = maximum  likelihood  estimate,  that  is. 


f = arg  max  p(rjyj) 


= arg  max  p(yj|r)  , 


P(yi|r)p(r) 

pTyT) 


p(r)  = j V r 


Figure  3.  12  shows  graphically  how  f is  chosen  from  an  observation 
of  yj.  Finally,  let 
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(just  invert  (3.  33)).  Then  it  can  be  shown  that  the  expected  distortion 
can  be  bounded  from  above: 


E[(x-x)^]  s:  i + 6F^--^| 


(3.34) 


where 


dx  . 


For  = . 0022,  the  right  hand  side  of  (3.  34)  equals  . 00097.  The 
distortion  value  for  the  best  linear  scheme,  given  by 


* <T 

J(H  ;2)  = -2-^ 
2+0 


2 

from  (3.31),  for  the  same  o , is  .0011.  Therefore,  the  nonlinear 

scheme  gives  lower  distortion  than  just  repeating.  Since  the  error 

function  F decreases  very  rapidly  as  a decreases,  the  nonlinear 

scheme  becomes  even  better  with  smaller  o.  For  example,  for 
2 

o = .001,  (3.34)  equals  .00017,  and  (3.  31)  equals  .00052. 
COUNTEREXAMPLE  3.  2.  * k < 1 

Let  n = 2 and  N = 1.  It  can  be  shown  that  an  optimal  linear 
encoder  is  u = (xj  + , with  expected  distortion 


T*/  2, 


1 


1 + CT 


2 


2 


See  Appendix  III-C  for  details. 
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(In  general,  linear  distortion  is 


n- 1 , 2 

+ CT 

n 


Graphically,  this  scheme  amounts  to  projecting  all  points  in  the  Xj, 
x^-space  to  the  diagonal  Xj  = x^.  One  way  to  "fill  the  space"  better, 
suggested  by  Shannon  [9],  is  to  construct  a real  number  u by  alter- 
nating the  digits  of  Xj  and  that  is,  if 


ai  a^  a3  ... 


bi  b^  b3  .. 


= • ^1  *^1  ^2  *^2  ^3  N 


This  nonlinear  scheme  fills  the  Xj,  X2" space  much  more  than  linear, 
but  it  is  difficult  to  deal  with  analytically. 

A simpler  nonlinear  scheme  that  fills  the  space  more  than  linear 
is  shown  in  Figure  3.  13,  where  0J  and  9^  are  transformations  f 
of  Xj  and  respectively,  to  the  interval  [-1,1].  All  points  are 

mapped  to  the  dotted  lines  in  the  following  way: 


a s 9 s b =»  (9,,  9,)  - (0 


P®2^  ( ®P  2 ) 


where 


) 


Let  = 3/4  correspond  to  row  number  r = 4,  92  - to  r = 3, 

etc.  Straighten  out  the  dotted  line  and  compress  it  to  fit  into  the 
interval  [-1,  1],  and  call  the  variable  u,  as  shown  in  Figure  3.  14. 
Then  it  can  be  shown  that 


“ = \ [(-!)''  ej  + 5 - 2r] 


Next  u is  sent  through  the  channel.  Let  u = y = u + e . Then  let 


a s - 1 


Os  u ^2 


u 


(-1)  (4u-5  + 2f),  -Isus  1 (just  invert  (3. 35)) 

-1  , u < -1 


u > 1 


^ §1  [-1,1] 
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-1 


Let  f(x)  = 2/ff  tan”  x,  and  let  expected  distortion  be  defined  as 


Then  it  can  be  shown  that 


^ B(a^) 


where 


B(a^) 


= 2 { [2’ 


^Ibu^  t .354 


(1-P,)  4 


(3.  36) 


and  P = Pr[f  / r]  = probability  of  error.  P is  a complicated 
expression  because  it  involves  the  probability  density  of  u,  a difficult 
thing  to  compute.  However,  P^  is  a continuous  increasing  function  of 
a^.  If  = 0,  then  P = 0 and  B(0)  = . 177  = J.,,  (0).  If  cr^  is 

© XMJ-i 

* 2 

sufficiently  small,  then  B will  still  be  less  than  ),  as  shown  in 

2 12 

Figure  3.15.  However,  as  ct  -•  » , P^-*l,  so  that  B-»5^  It  . (The 
2 

B(ct  ) curve  is  qualitative  and  not  based  on  numerical  calculations.  ) 

2 

Therefore,  for  sufficiently  small  <j  , the  nonlinear  scheme  is  better 
than  linear. 

5.  4 Asymptotic  Effects 

In  Chapter  II  it  was  shown  that  the  equilibrium  solutions  for 
Example  2.  3 of  the  job  market  model  exhibited  threshold  effects. 

Since  the  solutions  to  the  n = N case  of  the  Shannon  problem  are 
known,  we  might  ask  whether  they  also  exhibit  threshold  effects. 


However,  as  expressions  (3.27)  and  (3.26)  for  the  n = N = 1 case 

indicate,  asymptotic , not  threshold,*  effects  occur.  That  is, 

2 

signaling  ceases  in  the  limit  as  the  parameters  a and  cr  approach 

0 or  ».  For  example,  both  DM s still  feel  it  worthwhile  to  signal 

even  if  signaling  is  very  costly  (tighter  constraint  on  signal  power, 

2, 

i.  e.  , small  a)  or  very  noisy  (large  ct  ). 

With  no  noise,  v = x as  in  Example  2.  1,  demonstrating  that 
asymptotic  effects  can  occur  in  the  Spence  problem  as  well,  unless 
there  are  extra  constraints,  such  as  a minimum  ability  level  and  a 
maximum  educational  level.  Therefore,  we  cannot  state  any  general 
results  as  to  which  effects,  threshold  or  asymptotic,  will  occur  in 
any  given  problem.  The  payoff  structure  (team  vs.  NZS)  and  restrictions 
on  the  random  variables  (continuous  vs.  discrete,  and  infinite  vs. 
finite  range)  are  prime  candidates  as  the  factors  which  determine  the 
type  of  parameter  effects. 

5. 5 Summary 

In  general,  real-time  information  theory  solutions  are  sub- 
optimal  as  compared  to  " infinite- time " (Shannon)  theory  solutions. 
However,  if  the  dimension  n of  a Gaussian  source  with  a mean 
square  error  distortion  function  is  equal  to  the  dimension  N of  a 
memoryless  Gaussian  channel  with  a square  cost  function,  the  source 
and  channel  can  be  directly  connected,  with  appropriate  scaling  of  the 
channel  inputs,  so  as  to  satisfy  the  power  constraint.  The  distortion 
incurred  is  the  best  one  can  do,  since  it  equals  the  Shannon  bound.  If 


N>  n (k  = N/n>  1),  then  repeating  each  source  symbol  k times  is  a 
simple  suboptimal  strategy.  For  small  values  of  k,  its  performance 
is  close  to  optimal.  If  N<n(k<  1),  Theorem  3.  1 does  not  apply,  so 
that  little,  in  general,  can  be  said.  However,  counterexamples  with 
k = 2 and  1/2  are  described  where  nonlinear  encoders  are  better 
than  the  best  linear  ones. 
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APPENDIX  ni-A 

PROOFS  FOR  RESULTS  ON  OPTIMAL  LINEAR  CODERS 

Before  going  on  to  prove  Theorem  3.  1,  we  digress  to  note  that 
for  the  linear  solution  (3.27),  x = v = is  the  estinate  of  x that 

would  be  produced  by  a Kalman  filter.  Expected  distortion  E[D(x,  v)]  , 

is  simply  the  error  covariance,  referred  to  as  P in  the  control 
literature  (see  Bryson  and  Ho  [3]  for  derivation  of  Kalman  filter 

A 

formulas).  This  fact  makes  it  very  easy  to  evaluate  x and  P from 

linear  encoders  and  decoders  for  arbitrary  values  of  N and  n,  using 

the  standard  formulas  from  Kalman  filters  for  the  special  case  where 

X is  time -invariant.  In  general,  if  x N(x,  M),  where  M is 

the  n X n covariance  of  x,  N(0,R),  and  y = Hx  + e , H an  \ 

N X n matrix,  then  from  the  Kalman  filter 

X = X +PH^R‘^y-  Hi)  (3A.  1) 

P"^  = +H^R"^H  . (3A.  2) 

In  the  special  case  of  n = N = 1,  it  is  easy  to  check  that  for  our  model, 

H = and  x reduces  to  Equation  (3.  27b)  and  P to  (3.  26).  If  we 

restrict  ourselves  to  linear  encoders  "Yj  that  satisfy  the  power  con- 

straint  with  equality,  then  the  constraint  i 

i 

I 

1 

I 

3fc  r ' 

We  always  assume  equality  in  the  constraint,  because  the  rate  of 
transmission  increases  with  power.  Thus,  it  is  advantageous  to  use 
up  all  the  power  available. 
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becomes 

1 T 

a = — tr  E uu 

1 T T 

= tr  E(Hxx  H ) 

1 T T 

= i tr  (H^HExx^) 

1 T 

= tr  (H^H) 


From  (3A.  2)  and  (3A.  3),  (3.  30)  becomes: 


min  J(H)  = -tr(l  s.t.  tr  (h'^H)  = a 

H€Jif  \ cr‘^  ' ^ 


where  ^ is  the  set  of  Nx  n matrices  of  maximal  rank. 

THEOREM  3.  1.  Let  H be  the  optimal  solution  to  (3.  30). 
Then  for  k ^ 1 

* 2 
J(H  ;k)  = -2—^ 
ok+c 

Proof.  Since  I >0  and  s 0.  then 

n ' 

In  + -^  h'^H  >0, 


(3A.  3) 


(3A.4) 


(3.31) 


and  from  (3A.  4) 
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..H,  = i L ^ 


where  the  X.  are  the  eigenvalues  o£  + l/a^H  H and  X.  > 0 for 


all  i.  It  is  trivial  to  show  that 


X.  = 1 


^ i — 


where  the  h-  ate  the  eigenvalues  of  I/o^h'^H.  Then  the  constraint 


(3A.  3)  becomes 


Z n 


— y M.  = a 

N A 

1=  1 


^ JL  Ncy 

E X = £ (1  + = n + -^ 

i=l  ^ i=l 


So  the  problem  is  now: 


^ E 


" iTl  ^i 


> • t*  ^ • 

i=l  " 


where  r is  the  Lagrange  multiplier. 


^ tr  = 0. 

1 X. 


1— 


^ X*  = — Vi  (positive  square  root  because  eigen- 


^ V? 


values  positive). 
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In  order  for  all  to  be  equal,  we  must  have  all  u.  = Xj^  - 1 equal. 

T 

This  means  we  must  have  H H > 0.  Since  k & 1 and  H is  of 

T 

maximal  rank,  then  H H is,  in  fact,  positive  definite  (see  [3], 

T 

p.  444).  (If  k<  1,  then  H H can  only  be  assumed  to  be  positive 
semi -definite.  ) Then 


n 


^ = n+  ^ 

iTi  ^ V? 


V?  . 


no 


2 ” 2 
nc7  + N:^  a + ok 


J(H  ) 


, n , in 

i E ^ = ^ EV?=  V? 
1=1  1=1 
1 


+ ok 


COROLLARY  3.1.  For  ksl,  J(H)  = J(H  ). 

A 

Proof.  With  H as  in  (3.32),  then 

^ 1 1 / 1"T'‘\-1 

J(H)  = -tr  P - -tr(l^  + -2 

' rr  ' 


= Itr/l  f-^yokl  1"^ 
n \ n ^2  n / 


Q.  E.  D. 


Q.  E.  D. 


ok  + a 

Therefore,  within  the  class  of  linear  encoders  H,  none  gives 

A 

lower  distortion  than  H. 


^_ii 


APPENDIX  m-B 


COMPUTATIONAL  DETAILS  FOR  COUNTEREXAMPLE  3. 


Referring  to  Figure  3.  13,  for 
r = 1,  let  + 3j 

r = 2,  let  b(^  i-l) 

r = 3,  let  U2  = b(-^  - l) 

r = 4,  let  U2  = b(^  - 3^ 

U2  - b(^  + g(r(x») 


where 


g(r)  = 5 - 2r 


Since  a = 1,  the  power  constraint  has  been  nornnalized  to 


I E(u2  + u2)  = 1 


Let  Uj  = cr  for  some  scalar  c.  Then  a special  case  of  the 
is: 


As  in  Appendix  III-A,  we  assume  equality  in  the  constraint. 


(3B.  1 


(3B.  2 


constraint 


3-56 


1 


2tr  2 

c Er 


15  2 

2 


Also 


1 = Eu2  = B^E 


^ ¥ g(r(x))  + g(r(x))^ 


= 


\ +1  E(xg)  + 5 


where 


E(xg)  = 


4 


X p(x)  dx  + 3 I xp(x)  dx 

/a 


2:  - 2,08. 


Therefore 

1 = 1.5 

or 

= I , B^.82  . (3B.  3) 

Figure  3.  14  shows  graphically  how  f is  chosen  from  an  observation  of 
Yj.  Transform  to  y ^ =(yj-crV'a  ~ N(0,  1 ).  Then  the  cut-off 
points  (3/2)c,  (5/2)  c,  (7/2)c  in  Figure  3.  14  are  transformed  to  ± c/2a. 
Let  F be  the  error  function 

2 

y _x_ 

F(y)  = f -i-  e ^ dx  . 


APPENDIX  m-C 


COMPUTATIONAL  DETAILS  FOR  COUNTEREXAMPLE  3.2 


A general  linear  scheme  for  this  example  is 


f*'l 

i*2j 


+ bx_ 


Power  constraint: 

E[u^]  = 1 = E[(axj  + bx2>^]  = a^  + b 


Then 


y = u + e 


= axj  + bx^  + e 


= ax,  + (bx^  + e)  = ic,  = ^ y 


a +b“+CT 


= bx-  + (ax|  + e)  =»  X-  = E/  (x_)  = 


^2  • '“^1  " - ^2  " Vy'*2'  = "2\2  2 y 

a +b  +a 


A ^ 1 


E[D(x,v)]  ^ D = -jE  [(xj-5cj)^  + (X  -X  )^] 


T-Z-’l'i.  [(a^-l-a^)^  + 2aV  + (b^-l-a^)^ 


2(a"+b“+a  ) 


+ (a^+b^)CT^] 


Substituting  in  the  constraint  (3C.  1),  this  becomes 


which  is  independent  of  a and'  b.  Therefore,  any  a and  b satisfying 
(3C.  1)  will  be  an  optimal  linear  solution*  For  example,  consider 


For  the  nonlinear  scheme  described  in  Counterexample  3.  2 
. that  6 = f(x).  For  the  case  of  f(x)  = 2/lT  tan  ^ x. 


That  is,  the  constraint  is  rotationally  invariant 


3-60 


A v2  1 


= |e[(Xj-Xj)  + (X2-X2)  ] 


'NL 


= J (x2-.4l)^p(x2)dx2+  (x2-2.4)^P(x2)  dx2 


'0 

= . 177 


From  (3C.  2),  jliO)  = . 5 > therefore,  .inco  the  nonUnear 

coding  given  lower  distortion  than  the  linear  when  there  is  no  noise,  it 
„mst  do  the  same  lor  snlficiently  small  noise.  In  tact,  we  can  bound 


2 2 
Ji^TT  small  a . 

Nj-> 


Let 


g(e.)  = g(f(xj))  = X. 


(i.e.  , g = f ^ 


Then 


g'(f(x^))  £’{X.)  = 1 > S°  g’Ulx.))  - £l(x.) 


(3C.3) 


(3C.4) 
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(e^-e^)^  if  r = r (or  E[(x2-X2)^|?  = r]  = . 354  = 


(02-02)^  if  <^3  if  f/r, 


since  max 


l®2"®2 


g'*<V  2 

X.  = g(0.)  = g(0i)  + g'(0i)6i  + “2j — ^ » 

A 

T.  between  0^  and  6^^ 


from  (3C.  3)  and  (3C.4)  (for  very  small  6.,  i.  e.  , small  noise 

X.  - X.  « 

= i E[(Xj-Xj)2  ^ 

- i=[(r575r‘5+(i^r  ‘z] 

= + p-^p  = 'i 


+ E 


*See  end  of  this  appendix  for  bound  on 

g"(T.)l 

“2! 


1 

I'? 

i 


2J„l(0)) 
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Let  = Pr[f  / r]  (probability  of  error).  Also,  since 
f(x)  = ^ tan’  ^ X = 0 , 

then 

= I • 

Thus 

^ B(a^)  , 

where 

B(a^)  = 7 I [l  16a^  + . 354j  (1-P^)  + 10  | P^ 
Bound  on  |g"(x)|:  Let  T6[e,0]  and  6=0-e. 

g(0)  = tan  e 

g'(e)  = f (i  + tan^  I e) 


g"(e)  = 7T  tan  I 9 g'(e) 
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Thus,  we  see  from  Figures  3C.  3 and  3C.4  that 

|g(T)|  S max{|g(e)|,  |g(0)|3  = A(e,0) 

|g'(T)|  S max  { |g*(0)I , lg'(0)|}  = B(0,0) 
From  (3C.  5), 

|g"(T)l  S 7T  A(0,0)  B(0,0)  . 


Therefore,  |g"(T)|  will  not  blow  up  as  5 gets  smaller,  since  A and 
B do  not  blow  up  as  6 gets  smaller. 


CHAPTER  IV 
CONCLUSION 

The  main  features  of  the  Spence  problem  of  Chapter  H,  from  a 
decision  and  control  point  of  view,  are  the  dynamic  information 
structure  (i.e.,  signaling)  and  the  multiple  equilibria.  Multiple  Nash 
equilibria  in  a noncooperative  NZS  game  are  undesirable,  because 

DMl  may  choose  one  equilibrium  strategy,  say  7j,  but  DM2  might 

2t  b 

not  choose  the  corresponding  some  other  equilibrium  -^2’ 

a 

The  pair  (7^,  general,  is  not  an  equilibrium.  By  assuming 

certain  parameters  were  fixed  instead  of  variable,  we  avoided  this 
problem  and  obtained  a unique  equilibrium. 

As  a vehicle  for  insights,  the  model  was  set  up  as  a two-person 
decision  problem.  This  allowed  us  not  only  to  find  new  solutions,  but 
also  to  handle  modifications  of  the  problem  more  easily.  For  example, 
we  defined  an  adjustment  procedure  for  each  decision  maker  and 
proved  sufficient  conditions  for  stability.  We  also  investigated 
threshold  effects  and  found  that,  under  certain  circumstances,  signaling 
ceases  when  different  parameters  in  the  problem  are  varied.  The 
main  results  were  that  if  signaling  cost  or  signaling  noise  are  too 
high,  or  if  the  variability  of  the  underlying  unknown  signal  is  too  low, 
then  signaling  is  not  worthwhile.  Therefore,  from  what  originally 
appeared  as  a very  simple  example,  a tremendous  richness  of  detail 
and  insight  have  emerged. 

4-1 


1 


4-2 


Extending  the  decision  theory  framework,  we  saw  in  Chapter  III 
how  Shannon  information  theory  can  be  modeled  as  a two-person  team 
problem  with  signaling.  This  set-up  allowed  us  to  discuss  coding 
problems  in  which  the  delay  between  emission  of  source  symbols  and 
transmission  of  coded  signals  was  fixed.  This  was  called  "real-time 
information  theory,  ''  suggesting  applications  where  coding  precedes 
actions  which  must  be  taken  within  a specified  time.  Since  Shannon's 
theorem  states  the  best  we  can  do  with  no  delay  restriction,  it  provides 
a bound  against  which  we  can  judge  the  performance  of  the  real-time 
scheme.  General  results  about  performance  were  also  derived;  for 
example,  repetition  of  source  symbols  as  an  encoding  scheme  is  as 
good  as  the  best  linear  encoder.  However,  if  the  block  lengths  of  the 
source  and  channel  are  equal,  then  for  both  variable  and  fixed  delay, 
linear  encoders  and  decoders  are  optimal;  that  is,  they  attain  the 
Shannon  bound.  If  the  block  lengths  are  not  equal,  then  for  fixed  delay, 
linear  may  not  be  optimal. 

The  major  contribution  of  this  work  is  not  to  prove  significantly 
I new  results,  but  rather  to  unify  the  disparate  fields  of  team  theory, 

[ market  signaling  in  economics,  information  structures,  and  classical 

[ information  theory.  Hopefully,  the  general  conceptual  framework 

presented  here  will  encourage  joint  efforts  among  researchers  in  these 
I separate  fields. 
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